- Hunt's algorithm grows a decision tree in a recursive fashion by partitioning the training data into successively into subsets.
- Let Dt be the set of training data that reach a node ‘t’. The general recursive procedure is defined as:
- If Dt contains records that belong the same class yt, then t is a leaf node labeled as y
- If Dt is an empty set, then t is a leaf node labeled by the default class, yd
- If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets.
- It recursively applies the procedure to each subset until all the records in the subset belong to the same class.
- The Hunt's algorithm assumes that each combination of attribute sets has a unique class label during the procedure.
- If all the records associated with Dt have identical attribute values except for the class label, then it is not possible to split these records any In this case, the node is declared a leaf node with the same class label as the majority class of training records associated with this node.
Tree induction is based on Greedy Strategy i.e. split the records based on an attribute test that optimize certain criterion.
- How to split the record?
2. How to specify the attribute test condition?
- Depends on attribute types and number of ways to split the record e. 2-ways split /multi- way split.
- Depends upon attribute (Nominal, Ordinal, Continuous)
3. When to stop splitting?
- When all records are belongs to the same class or all records have similar attributes.
4. How to determine the best split?
- Nodes with homogenous class distribution are preferred.
- Measure the node impurity.
- Misclassification Error
- The Gini Index measures the impurity of data set (D) as:
- Gini(D) = 1 - ∑𝑛 𝑝i2
n = Number of classes, 𝑝i = Probability of ith class.
- It consider binary split for each
- When D is partition into D1 and D2 then
Gini(D) = D1/D Gini(D1) + D2/D Gini(D2)
- The attribute that maximize the reduction in impurity is selected as splitting attribute.