• Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold and confidence ≥ minconf threshold.
  • Some of approaches for association rules mining are:

1. Brute- Force Approach

  • List all possible association rules.
  • Compute the support and confidence for each rule.
  • Prune rules that fail to minimum support and minimum confidence level.

*This approach is computationally very expensive.

2. Frequent Itemset Generation Strategies

  • Reduce the number of candidates (M): For complete search, M=2d. Use pruning techniques to reduce M.
  • Reduce the number of transactions (N): Reduce size of N as the size of itemset increases.
  • Reduce the number of comparisons (NM): Use efficient data structures to store the candidates or transactions. No need to match every candidate against every transaction.

3.  Apriori Approach

  • Apriori approach is two step approach: Frequent item generation and Rules generation
  • Based on apriori principal

Apriori Principle:

  • Supersets of non-frequent item are also non-frequent. Or, If an itemset is frequent, then all of its subset also be frequent.
  • Apriori algorithm is an influential algorithm for mining frequent itemset.
  • It use a level-wise search, k-itemsets are used to explore k+1 itemset.
  • At first, the set of frequent itemset is found and used to generate to frequent itemset at next level and so on.

Apriori Algorithm:

  • Read the transaction database and get support for each itemset, compare the support with minimum support to generate frequent itemset at level 1.
  • Use join to generate a set of candidate k-itmesets at next level.
  • Generate frequent ietmsets at next level using minimum support.
  • Repeat step 2 and 3 until no frequent itme sets can be generated.
  • Generate rules form frequent itemsets from level 2 onwards using minimum confidence.

**This approach has faster than Brute-Force approach but still has higher computational complexity.

Candidate counting:

  • Scan the database of transactions to determine the support of each candidate itemset
  • To reduce the number of comparisons, store the candidates in a hash structure 
  • Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets

4. Hash Table

  • A hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values.
  • A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
  • Max leaf size: max number of itemsets stored in a leaf node, if number of candidate itemsets exceeds max leaf size, split the node.

Factors Affecting Complexity

  1. Choice of minimum support threshold: Lowering support threshold results in more frequent itemsets. This may increase number of candidates and max length of frequent itemsets.
  2. Dimensionality (number of items) of the data set: More space is needed to store support count of each item. If number of frequent items increases, both computation and I/O costs may also increase.
  3. Size of database: Since Apriori makes multiple passes, run time of algorithm may increase with number of transactions.
  4. Average transaction width: Transaction width increases with denser data sets. This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width)