It classifies records by using a collection of “If ……. Then…..” A rule base classifier uses a set of “If ……. Then…..” rules for classification.
eg: If age = youth AND student = yes THEN buys_computer = yes.
- The ‘If’ part or left hand side of a rule is known as the rule antecedent or precondition where as the ‘Then” part or right hand side is the rule consequent.
- In the rule antecedent, the condition consists of one or more attribute tests.
- If the condition in a rule antecedent holds true for a given tuple, the rule antecedent is satisfied and that the rule covers the tuple.
- Coverage of a rule is the fraction of records that satisfy the antecedent of a rule.
Coverage = Ncovers / D
Where,
Ncovers = number of record that can be classified by the rule.
D = total data set.
- Accuracy of a rule is fraction of records that satisfy both the antecedent and consequent of a rule.
Accuracy = Ncorrect / Ncovers
Where,
Ncorrect = Number of records that are correctly classified by the rule
Ncovers = Number of record that can be classified by the rule
How does Rule-Based Classifier work?
- If a rule is satisfied by a tuple, the rule is said to be Triggering doesn’t always mean firing because there may be more than one rules that can be satisfied.
- Three different cases occur for classification.
Case-I: If only one rule is satisfied
- When any instances is covered by only one rule then the rule fires by returning the class prediction for the tuple defined by the rule.
Case-II: If more than one rules are satisfied
- If more than one rules are triggered, we need a conflict resolution strategy to find which rule is fired.
- Rule ordering or rule ranking or rule priority can be set in case of rules A rule ordering may be class-based or rule-based.
- Rule-based ordering: Individual rules are ranked based on their quality.
- Class-based ordering: Rules that belong to the same class appear together
- When rule-based ordering is used, the rule set is known as a decision list.
Case-III: If no rule is satisfied
- If any instance not triggered by any rule, use default class for Mostly most frequent class is assigned as default class.
Eg:
S.No. |
Name |
Blood Type |
Give Birth |
Can fly |
Live in water |
Class |
1 |
Lemur |
Warm |
Yes |
No |
No |
? |
2 |
Turtle |
Cold |
No |
No |
Sometimes |
? |
3 |
Shark |
Cold |
Yes |
No |
Yes |
? |
Rule base
R1: (Give Birth = No) ^ (Can fly = Yes) => Birds
R2: (Give Birth = No) ^ (Live in Water = Yes) => Fishes
R3: (Give Birth = Yes) ^ (Blood Type = Warm) => Mammals
R4: (Give Birth = No) ^ (Can fly = No) => Reptiles
R5: (Live in Water = Sometimes) => Amphibians
- In above example, R1 and R2 don’t have any R3, R4 & R5 have coverage.
- Instance 1 is triggered by R3, instance 2 is triggered by R4 & R5 and instance 3 is not triggered by any instances.
- Since instance 1 is triggered by only one rule (R3) so it is fired as a class mammal, instance 2 is triggered by more than two rules (R4 & R5) and hence conflict occurs. To resolve the conflict the class can be identified using priority (rule priority or class priority). Instance 3 is not triggered by any rules, to resolve this conflict default class can be used.
Characteristics of Rule-Based Classifier
- Mutually exclusive Rules
- Classifier contains mutually exclusive rules if all the rules are independent of each other.
- Every record is covered by at most one rule.
- Rules are no longer mutually exclusive if a record may triggered by more than one To make mutually exclusive we apply rule ordering.
2. Exhaustive Rules
- Classifier has exhaustive coverage if it accounts for every possible combination of attribute values (every possible rule).
- Each record is covered by at least one rule.
- Rules are no longer exhaustive if a record may bit trigger any To make rules exhaustive use default class.
Building Classification Rules
- Two approaches are used to build classification rules.
A. Direct Method
- Extract rules directly from data. It is an inductive and sequential approach.
Sequential Covering
- Start from an empty rule
- Grow a rule using the Learn-One-Rule function
- Remove training records covered by the rule
- Repeat Step (2) and (3) until stopping criterion is met
Aspects of Sequential Covering
- Rule Growing
- Instance Elimination
- Rule Evaluation
- Stopping Criterion
- Rule Pruning
i. Rule Growing
CN2 Algorithm:
- Start from an empty conjunct: {}
- Add conjuncts that minimizes the entropy measure: {A}, {A,B}, …
- Determine the rule consequent by taking majority class of instances covered by the rule
RIPPER Algorithm:
- Start from an empty rule: {} => class
- Add conjuncts that maximize FOIL’s information gain measure: R0: {} => class (initial rule)
R1: {A} => class (rule after adding conjunct)
Gain (R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
Where, t: number of positive instances covered by both R0 and
R1 p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
ii. Instance Elimination
- We need to eliminate instances otherwise, the next rule is identical to previous rule.
- We remove positive instances to ensure that the next rule is different.
- We remove negative instances to prevent underestimating accuracy of rule
iii. Rule Evaluation
iv. Stopping Criterion and Rule Pruning Stopping criterion
-
- Compute the gain
- If gain is not significant, discard the new rule.
Rule Pruning
- Similar to post-pruning of decision trees.
- Reduced Error Pruning:
Remove one of the conjuncts in the rule
Compare error rate on validation set before and after pruning
If error improves, prune the conjunct
B. Indirect Method:
Extract rules from other classification models (e.g. decision trees, neural networks, etc).
Eg; Rule Extraction from Decision Tree
Rules:
R1: (Refund = Yes) => Loan
R2: (Refund = No) ^ (Marital Status = Married) => Loan
Rule simplification
Complex rules can be simplified. In above example R2 can be simplified as:
r2: (Marital Status = Married) => Loan
Advantages of Rule-Based Classifiers
- As highly expressive as decision trees
- Easy to interpret
- Easy to generate
- Can classify new instances rapidly
- Performance comparable to decision trees