At this critical juncture, the dependency on data for driving business decisions has increased to a great extent. Data Mining is one such method that helps in decision making. It is the process of deriving trends, patterns, and useful information from a massive amount of data. The data mining process of discovering the rules that govern associations and causal objects between sets of items is known as Association Rule Mining. It helps in discovering relationships between databases that seem to be independent thus developing connections between datasets.
In this article, you will gain information about Association Rule Mining. You will also gain a holistic understanding of Data Mining, different rules, and definitions associated with Association Rule Mining, and its applications. It will also provide information about the algorithms associated with Association Rule Mining.
What is Association Rule Mining?
Association Rule Mining is a method for identifying frequent patterns, correlations, associations, or causal structures in data sets found in numerous databases such as relational databases, transactional databases, and other types of data repositories.
Since most machine learning algorithms work with numerical datasets, they are mathematical in nature. But, Association Rule Mining is appropriate for non-numeric, categorical data and requires a little more than simple counting.
Given a set of transactions, the goal of association rule mining is to find the rules that allow us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.
An association rule consists of two parts:
- an antecedent (if) and
- a consequent (then)
An antecedent is something found in data, and a consequent is something located in conjunction with the antecedent.
For a quick understanding, consider the following association rule:
“If a customer buys bread, he’s 70% likely of buying milk.”
Bread is the antecedent in the given association rule, and milk is the consequent.
Looking for the best ETL tools to connect your data sources? Rest assured, Hevo’s no-code platform helps streamline your ETL process. Try Hevo and equip your team to:
- Integrate data from 150+ sources(60+ free sources).
- Simplify data mapping with an intuitive, user-friendly interface.
- Instantly load and sync your transformed data into your desired destination.
Choose Hevo for a seamless experience and know why Industry leaders like Meesho say- “Bringing in Hevo was a boon.”
Get Started with Hevo for Free
Use Cases of Association Rule Mining
- Market Basket Analysis: Discover products frequently bought together (e.g., “bread and butter”) for cross-selling and promotions.
- Customer Segmentation: Group customers with similar purchase histories to tailor marketing campaigns.
- Fraud Detection: Identify unusual transaction patterns that may indicate fraudulent activity.
- Web Usage Mining: Analyze user behavior on websites to improve navigation and user experience.
- Recommendation Systems: Suggest relevant products to customers based on their purchase history and preferences.
Usage of Association Rule Mining
The usage of Association Rule Mining is illustrated below:
1) Association Rule Mining: Basic Definitions
Before defining the rules of Association Rule Mining, let us first have a look at the basic definitions.
- Support Count(σ): It accounts for the frequency of occurrence of an itemset.
Here σ({Milk, Bread, Diaper})=2
- Frequent Itemset: It represents an itemset whose support is greater than or equal to the minimum threshold.
- Association Rule: It represents an implication expression of the form X -> Y. Here X and Y represent any 2 itemsets.
Example: {Milk, Diaper}->{Beer}
2) Association Rule Mining: Rule Evaluation Metrics
The rule evaluation metrics used in Association Rule Mining are as follows:
- Support(s): It is the number of transactions that include items from the {X} and {Y} parts of the rule as a percentage of total transactions. It can be represented in the form of a percentage of all transactions that shows how frequently a group of items occurs together.
- Support = σ(X+Y) ÷ total: It is a fraction of transactions that include both X and Y.
- Confidence(c): This ratio represents the total number of transactions of all of the items in {A} and {B} to the number of transactions of the items in {A}.
- Conf(X=>Y) = Supp(X∪Y) ÷ Supp(X): It counts the number of times each item in Y appears in transactions that also include items in X.
- Lift(l): The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence. here, it is assumed that the itemsets X and Y are independent of one another. The expected confidence is calculated by dividing the confidence by the frequency of {Y}.
- Lift(X=>Y) = Conf(X=>Y) ÷ Supp(Y): Lift values near 1 indicate that X and Y almost always appear together as expected. Lift values greater than 1 indicate that they appear together more than expected, and lift values less than 1 indicate that they appear less than expected. Greater lift values indicate a more powerful association.
Applications of Association Rule Mining
Some of the applications of Association Rule Mining are as follows:
1) Market-Basket Analysis
In most supermarkets, data is collected using barcode scanners. This database is called the “market basket” database. It contains a large number of past transaction records. Every record contains the name of all the items each customer purchases in one transaction. From this data, the stores come to know the inclination and choices of items of the customers. And according to this information, they decide the store layout and optimize the cataloging of different items.
A single record contains a list of all the items purchased by a customer in a single transaction. Knowing which groups are inclined toward which set of items allows these stores to adjust the store layout and catalog to place them optimally next to one another.
2) Medical Diagnosis
Association rules in medical diagnosis can help physicians diagnose and treat patients. Diagnosis is a difficult process with many potential errors that can lead to unreliable results. You can use relational association rule mining to determine the likelihood of illness based on various factors and symptoms. This application can be further expanded using some learning techniques on the basis of symptoms and their relationships in accordance with diseases.
3) Census Data
The concept of Association Rule Mining is also used in dealing with the massive amount of census data. If properly aligned, this information can be used in planning efficient public services and businesses.
Algorithms of Association Rule Mining
Some of the algorithms which can be used to generate association rules are as follows:
1) Apriori Algorithm
It delivers by characteristic the foremost frequent individual things within the information and increasing them to larger and bigger item sets as long as those item sets seem ofttimes enough within the information.
The common itemsets ensured by apriori also are accustomed make sure association rules that highlight trends within the information. It counts the support of item sets employing a breadth-first search strategy and a candidate generation perform that takes advantage of the downward closure property of support.
2) Eclat Algorithm
Eclat denotes equivalence class transformation. The set intersection was supported by its depth-first search formula. It’s applicable for each successive and parallel execution with spot-magnifying properties. This can be the associate formula for frequent pattern mining supported by the item set lattice’s depth-first search cross.
- It is a DFS cross of the prefix tree rather than a lattice.
- For stopping, the branch and a specific technique are used.
3) FP-growth Algorithm
This algorithm is also called a recurring pattern. The FP growth formula is used for locating frequent item sets terribly dealings data but not for candidate generation.
This was primarily designed to compress the database that provides frequent sets and then divides the compressed data into conditional database sets.
This conditional database is associated with a frequent set. Each database then undergoes the process of data mining.
The data source is compressed using the FP-tree data structure.
This algorithm operates in two stages. These are as follows:
- FP-tree construction
- Extract frequently used itemsets
How to Implement Association Rule Mining Using Python
Step 1: Install PyCaret
# install pycaret
pip install pycaret
Step 2: Load Sample Data
# load sample data
from pycaret.datasets import get_data
data = get_data('france')
Step 3: init Setup
# init setup
from pycaret.arules import *
s = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description')
Step 4: Create Model
The `create_model` function runs the algorithm and return the rules in pandas DataFrame based on the selection parameters defined in the `create_model`. In this example we have used selection metric `confidence` with threshold and support of 0.5.
# train model
arules = create_model(metric='confidence', threshold=0.5, min_support=0.05)
Learn More About:
Drawbacks of Association Rule Mining
The primary disadvantages of Association Rule Mining are as follows:
- A lengthy procedure of obtaining monotonous rules.
- Having a large number of discovered rules.
- Low performance of the Association Rule algorithms.
- Consideration of a lot of parameters for obtaining the rules.
Conclusion
In this article, you have learned about Association Rule Mining. This article also provided information on Data Mining, different rules, and definitions associated with Association Rule Mining, and its applications. It will also provide information about the algorithms associated with Association Rule Mining. Sign up for Hevo’s 14-day trial and learn more about your data.
Share your experience in the comment section below! We would love to hear your thoughts.
FAQs
1. What is the association rule mining?
Association rule mining is a data mining technique used to identify relationships or patterns among variables in large datasets, such as “If a customer buys item A, they are likely to buy item B.”
2. What are the disadvantages of association rule mining?
-Generating a large number of rules, many of which may be irrelevant.
-Computationally intensive for large datasets.
-Difficulty in determining thresholds for support and confidence.
3. What is an example of an association rule?
An example is: “If a customer buys bread, they are 80% likely to buy butter.” Here, bread → butter represents the association rule.
Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.