It classifies records by using a collection of “If ……. Then…..” A rule base classifier uses a set of “If ……. Then…..” rules for classification.

eg: If age = youth AND student = yes THEN buys_computer = yes.

  • The ‘If’ part or left hand side of a rule is known as the rule antecedent or precondition where as the ‘Then” part or right hand side is the rule consequent.
  • In the rule antecedent, the condition consists of one or more attribute tests.
  • If the condition in a rule antecedent holds true for a given tuple, the rule antecedent is satisfied and that the rule covers the tuple.
  • Coverage of a rule is the fraction of records that satisfy the antecedent of a rule.

Coverage = Ncovers / D

Where,

Ncovers = number of record that can be classified by the rule.

D = total data set.

  • Accuracy of a rule is fraction of records that satisfy both the antecedent and consequent of a rule.

Accuracy = Ncorrect / Ncovers

Where,

Ncorrect = Number of records that are correctly classified by the rule

NcoversNumber of record that can be classified by the rule

How does Rule-Based Classifier work?

  • If a rule is satisfied by a tuple, the rule is said to be Triggering doesn’t always mean firing because there may be more than one rules that can be satisfied.
  • Three different cases occur for classification.

Case-I: If only one rule is satisfied

  • When any instances is covered by only one rule then the rule fires by returning the class prediction for the tuple defined by the rule.

Case-II: If more than one rules are satisfied

  • If more than one rules are triggered, we need a conflict resolution strategy to find which rule is fired.
  • Rule ordering or rule ranking or rule priority can be set in case of rules A rule ordering may be class-based or rule-based.
  • Rule-based ordering: Individual rules are ranked based on their quality.
  • Class-based ordering: Rules that belong to the same class appear together
  • When rule-based ordering is used, the rule set is known as a decision list.

Case-III: If no rule is satisfied

  • If any instance not triggered by any rule, use default class for Mostly most frequent class is assigned as default class.

Eg: 

S.No.

Name

Blood Type

Give Birth

Can fly

Live in water

Class

1

Lemur

Warm

Yes

No

No

?

2

Turtle

Cold

No

No

Sometimes

?

3

Shark

Cold

Yes

No

Yes

?

Rule base

R1: (Give Birth = No) ^ (Can fly = Yes) => Birds

R2: (Give Birth = No) ^ (Live in Water = Yes) => Fishes

R3: (Give Birth = Yes) ^ (Blood Type = Warm) => Mammals

R4: (Give Birth = No) ^ (Can fly = No) => Reptiles

R5: (Live in Water = Sometimes) => Amphibians

  • In above example, R1 and R2 don’t have any R3, R4 & R5 have coverage.
  • Instance 1 is triggered by R3, instance 2 is triggered by R4 & R5 and instance 3 is not triggered by any instances.
  • Since instance 1 is triggered by only one rule (R3) so it is fired as a class mammal, instance 2 is triggered by more than two rules (R4 & R5) and hence conflict occurs. To resolve the conflict the class can be identified using priority (rule priority or class priority). Instance 3 is not triggered by any rules, to resolve this conflict default class can be used.

Characteristics of Rule-Based Classifier

  1. Mutually exclusive Rules
    • Classifier contains mutually exclusive rules if all the rules are independent of each other.
    • Every record is covered by at most one rule.
    • Rules are no longer mutually exclusive if a record may triggered by more than one To make mutually exclusive we apply rule ordering.

2.      Exhaustive Rules

  • Classifier has exhaustive coverage if it accounts for every possible combination of attribute values (every possible rule).
  • Each record is covered by at least one rule.
  • Rules are no longer exhaustive if a record may bit trigger any To make rules exhaustive use default class.

Building Classification Rules

  • Two approaches are used to build classification rules.

A.     Direct Method

- Extract rules directly from data. It is an inductive and sequential approach.

Sequential Covering

  1. Start from an empty rule
  2. Grow a rule using the Learn-One-Rule function
  3. Remove training records covered by the rule
  4. Repeat Step (2) and (3) until stopping criterion is met

Aspects of Sequential Covering

  • Rule Growing
  • Instance Elimination
  • Rule Evaluation
  • Stopping Criterion
  • Rule Pruning

i.     Rule Growing

CN2 Algorithm:

  • Start from an empty conjunct: {}
  • Add conjuncts that minimizes the entropy measure: {A}, {A,B}, …
  • Determine the rule consequent by taking majority class of instances covered by the rule

RIPPER Algorithm:

  • Start from an empty rule: {} => class
  • Add conjuncts that maximize FOIL’s information gain measure: R0: {} => class (initial rule)

R1: {A} => class (rule after adding conjunct)

Gain (R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]

Where, t: number of positive instances covered by both R0 and

R1 p0: number of positive instances covered by R0

n0: number of negative instances covered by R0

p1: number of positive instances covered by R1

n1: number of negative instances covered by R1

ii.   Instance Elimination

  • We need to eliminate instances otherwise, the next rule is identical to previous rule.
  • We remove positive instances to ensure that the next rule is different.
  • We remove negative instances to prevent underestimating accuracy of rule

iii. Rule Evaluation

iv. Stopping Criterion and Rule Pruning Stopping criterion

    • Compute the gain
    • If gain is not significant, discard the new rule.

Rule Pruning

  • Similar to post-pruning of decision trees.
  • Reduced Error Pruning:

Remove one of the conjuncts in the rule

Compare error rate on validation set before and after pruning

If error improves, prune the conjunct

B. Indirect Method:

Extract rules from other classification models (e.g. decision trees, neural networks, etc).

 

Eg; Rule Extraction from Decision Tree

Rules:

R1: (Refund = Yes) => Loan

R2: (Refund = No) ^ (Marital Status = Married) => Loan

Rule simplification

Complex rules can be simplified. In above example R2 can be simplified as:

r2: (Marital Status = Married) => Loan

Advantages of Rule-Based Classifiers

  • As highly expressive as decision trees
  • Easy to interpret
  • Easy to generate
  • Can classify new instances rapidly
  • Performance comparable to decision trees