• Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold and confidence ≥ minconf threshold.
• Some of approaches for association rules mining are:

### 1. Brute- Force Approach

• List all possible association rules.
• Compute the support and confidence for each rule.
• Prune rules that fail to minimum support and minimum confidence level.

*This approach is computationally very expensive.

### 2. Frequent Itemset Generation Strategies

• Reduce the number of candidates (M): For complete search, M=2d. Use pruning techniques to reduce M.
• Reduce the number of transactions (N): Reduce size of N as the size of itemset increases.
• Reduce the number of comparisons (NM): Use efficient data structures to store the candidates or transactions. No need to match every candidate against every transaction.

### 3.  Apriori Approach

• Apriori approach is two step approach: Frequent item generation and Rules generation
• Based on apriori principal

#### Apriori Principle:

• Supersets of non-frequent item are also non-frequent. Or, If an itemset is frequent, then all of its subset also be frequent.
• Apriori algorithm is an influential algorithm for mining frequent itemset.
• It use a level-wise search, k-itemsets are used to explore k+1 itemset.
• At first, the set of frequent itemset is found and used to generate to frequent itemset at next level and so on.

#### Apriori Algorithm:

• Read the transaction database and get support for each itemset, compare the support with minimum support to generate frequent itemset at level 1.
• Use join to generate a set of candidate k-itmesets at next level.
• Generate frequent ietmsets at next level using minimum support.
• Repeat step 2 and 3 until no frequent itme sets can be generated.
• Generate rules form frequent itemsets from level 2 onwards using minimum confidence.

**This approach has faster than Brute-Force approach but still has higher computational complexity.

#### Candidate counting:

• Scan the database of transactions to determine the support of each candidate itemset
• To reduce the number of comparisons, store the candidates in a hash structure
• Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets

### 4. Hash Table

• A hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values.
• A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
• Max leaf size: max number of itemsets stored in a leaf node, if number of candidate itemsets exceeds max leaf size, split the node.

## Factors Affecting Complexity

1. Choice of minimum support threshold: Lowering support threshold results in more frequent itemsets. This may increase number of candidates and max length of frequent itemsets.
2. Dimensionality (number of items) of the data set: More space is needed to store support count of each item. If number of frequent items increases, both computation and I/O costs may also increase.
3. Size of database: Since Apriori makes multiple passes, run time of algorithm may increase with number of transactions.
4. Average transaction width: Transaction width increases with denser data sets. This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width)