The ID3 algorithm begins with the original dataset as the root node.
On each iteration of the algorithm, it iterates through every unused attribute of the dataset and calculates the entropy (or information gain ) of that attribute.
It then selects the attribute which has the smallest entropy (or largest information gain) value.
The dataset is then split by the selected attribute to produce subsets of the data.
The algorithm continues to recur on each subset, considering only attributes never selected before.
Recursion on a subset may stop in one of these cases:
Algorithm
- Every element in the subset belongs to the same class , then the node is turned into a leaf and labeled with the class of the examples
- If the examples do not belong to the same class ,
- Calculate entropy and hence information gain to select the best node to split data.
- Partition the data into subset.
- Recursively repeat until all data are correctly classified
Throughout the algorithm, the decision tree is constructed with each non-terminal node representing the selected attribute on which the data was split, and terminal nodes representing the class label of the final subset of this branch.
Entropy
Entropy H(S) is a measure of the amount of uncertainty in the dataset (S) (i.e. entropy characterizes the dataset (S)).
Where,
- S The current dataset for which entropy is being calculated (changes every iteration of the ID3 algorithm)
- - Set of classes in
- P(x) - The probability of each set S
- When H(S) = 0, the set S is perfectly classified (i.e. all elements in S are of the same class).
- In ID3, entropy is calculated for each remaining The attribute with the smallest entropy is used to split the set S on this iteration.
- The higher the entropy, the higher the potential to improve the classification here.
Information Gain
Information gain is the measure of the difference in entropy from before to after the set S is split on an attribute A. In other words, how much uncertainty in dataset (S) was reduced after splitting dataset S on attribute A.
Where,
- H(S) - Entropy of dataset S
- T - The subsets created from splitting dataset S by attribute A.
- P(t) - The probability of class t
- H(t) - Entropy of subset t
In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set on this iteration.