It classifies records by using a collection of “If ……. Then…..” A rule base classifier uses a set of “If ……. Then…..” rules for classification.

eg: If age = youth AND student = yes THEN buys_computer = yes.

  • The ‘If’ part or left hand side of a rule is known as the rule antecedent or precondition where as the ‘Then” part or right hand side is the rule consequent.
  • In the rule antecedent, the condition consists of one or more attribute tests.
  • If the condition in a rule antecedent holds true for a given tuple, the rule antecedent is satisfied and that the rule covers the tuple.
  • Coverage of a rule is the fraction of records that satisfy the antecedent of a rule.

Coverage = Ncovers / D


Ncovers = number of record that can be classified by the rule.

D = total data set.

  • Accuracy of a rule is fraction of records that satisfy both the antecedent and consequent of a rule.

Accuracy = Ncorrect / Ncovers


Ncorrect = Number of records that are correctly classified by the rule

NcoversNumber of record that can be classified by the rule

How does Rule-Based Classifier work?

  • If a rule is satisfied by a tuple, the rule is said to be Triggering doesn’t always mean firing because there may be more than one rules that can be satisfied.
  • Three different cases occur for classification.

Case-I: If only one rule is satisfied

  • When any instances is covered by only one rule then the rule fires by returning the class prediction for the tuple defined by the rule.

Case-II: If more than one rules are satisfied

  • If more than one rules are triggered, we need a conflict resolution strategy to find which rule is fired.
  • Rule ordering or rule ranking or rule priority can be set in case of rules A rule ordering may be class-based or rule-based.
  • Rule-based ordering: Individual rules are ranked based on their quality.
  • Class-based ordering: Rules that belong to the same class appear together
  • When rule-based ordering is used, the rule set is known as a decision list.

Case-III: If no rule is satisfied

  • If any instance not triggered by any rule, use default class for Mostly most frequent class is assigned as default class.




Blood Type

Give Birth

Can fly

Live in water























Rule base

R1: (Give Birth = No) ^ (Can fly = Yes) => Birds

R2: (Give Birth = No) ^ (Live in Water = Yes) => Fishes

R3: (Give Birth = Yes) ^ (Blood Type = Warm) => Mammals

R4: (Give Birth = No) ^ (Can fly = No) => Reptiles

R5: (Live in Water = Sometimes) => Amphibians

  • In above example, R1 and R2 don’t have any R3, R4 & R5 have coverage.
  • Instance 1 is triggered by R3, instance 2 is triggered by R4 & R5 and instance 3 is not triggered by any instances.
  • Since instance 1 is triggered by only one rule (R3) so it is fired as a class mammal, instance 2 is triggered by more than two rules (R4 & R5) and hence conflict occurs. To resolve the conflict the class can be identified using priority (rule priority or class priority). Instance 3 is not triggered by any rules, to resolve this conflict default class can be used.

Characteristics of Rule-Based Classifier

  1. Mutually exclusive Rules
    • Classifier contains mutually exclusive rules if all the rules are independent of each other.
    • Every record is covered by at most one rule.
    • Rules are no longer mutually exclusive if a record may triggered by more than one To make mutually exclusive we apply rule ordering.

2.      Exhaustive Rules

  • Classifier has exhaustive coverage if it accounts for every possible combination of attribute values (every possible rule).
  • Each record is covered by at least one rule.
  • Rules are no longer exhaustive if a record may bit trigger any To make rules exhaustive use default class.

Building Classification Rules

  • Two approaches are used to build classification rules.

A.     Direct Method

- Extract rules directly from data. It is an inductive and sequential approach.

Sequential Covering

  1. Start from an empty rule
  2. Grow a rule using the Learn-One-Rule function
  3. Remove training records covered by the rule
  4. Repeat Step (2) and (3) until stopping criterion is met

Aspects of Sequential Covering

  • Rule Growing
  • Instance Elimination
  • Rule Evaluation
  • Stopping Criterion
  • Rule Pruning

i.     Rule Growing

CN2 Algorithm:

  • Start from an empty conjunct: {}
  • Add conjuncts that minimizes the entropy measure: {A}, {A,B}, …
  • Determine the rule consequent by taking majority class of instances covered by the rule

RIPPER Algorithm:

  • Start from an empty rule: {} => class
  • Add conjuncts that maximize FOIL’s information gain measure: R0: {} => class (initial rule)

R1: {A} => class (rule after adding conjunct)

Gain (R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]

Where, t: number of positive instances covered by both R0 and

R1 p0: number of positive instances covered by R0

n0: number of negative instances covered by R0

p1: number of positive instances covered by R1

n1: number of negative instances covered by R1

ii.   Instance Elimination

  • We need to eliminate instances otherwise, the next rule is identical to previous rule.
  • We remove positive instances to ensure that the next rule is different.
  • We remove negative instances to prevent underestimating accuracy of rule

iii. Rule Evaluation

iv. Stopping Criterion and Rule Pruning Stopping criterion

    • Compute the gain
    • If gain is not significant, discard the new rule.

Rule Pruning

  • Similar to post-pruning of decision trees.
  • Reduced Error Pruning:

Remove one of the conjuncts in the rule

Compare error rate on validation set before and after pruning

If error improves, prune the conjunct

B. Indirect Method:

Extract rules from other classification models (e.g. decision trees, neural networks, etc).


Eg; Rule Extraction from Decision Tree


R1: (Refund = Yes) => Loan

R2: (Refund = No) ^ (Marital Status = Married) => Loan

Rule simplification

Complex rules can be simplified. In above example R2 can be simplified as:

r2: (Marital Status = Married) => Loan

Advantages of Rule-Based Classifiers

  • As highly expressive as decision trees
  • Easy to interpret
  • Easy to generate
  • Can classify new instances rapidly
  • Performance comparable to decision trees