Association Rule Mining - Apriori Algorithm

Adekanmbi 'Yosola
5 min readDec 17, 2018

Association Rule is one of the very important concepts of machine learning being used in market basket analysis.

Market Basket Analysis is the study of customer transaction databases to determine dependencies between the various items they purchase at different times .

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It identifies frequent if-then associations called association rules which consists of an antecedent (if) and a consequent (then).

For example: “If tea and milk, then sugar” (“If tea and milk are purchased, then sugar would also be bought by the customer”)

Antecedent: Tea and Milk

Consequent: Sugar.

There are three common metrics to measure association:

Support is an indication of how frequently the items appear in the data. Mathematically, support is the fraction of the total number of transactions in which the item set occurs.

Confidence indicates the number of times the if-then statements are found true. Confidence is the conditional probability of occurrence of consequent given the antecedent.

Lift can be used to compare confidence with expected confidence. This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. Mathematically,

With the million(and more) of rows and columns that can exist in a transactional database, it’ll be hard to manually use these mathematical formulas to find relations among itemsets. In this post, I’ll be using the apriori algorithm from the mlxtend library. The apriori algorithm is a popular algorithm for extracting frequent itemsets.

Below we import the libraries to be used. Numpy for computing large, multi-dimensional arrays and matrices, Pandas offers data structures and operations for manipulating numerical tables and Matplotlib for plotting lines, bar-chart, graphs, histograms etc.

Also we import the apriori algorithm from mlxtend library.

The data used for this analysis is a pharmacy’s POS transactional data for the month of may. You can find the data and all codes hosted on Kaggle.

We want to help the pharmacist understand the pattern in which people buy from his pharmacy, so as to be able to plan how he’ll position items around his pharmacy as well as how he’ll make optimum profit by knowing which items customers would want and stock his shelves with them without having to turn them back.

We start by importing the dataset. We’ll only require the “Article” and “Ref” columns. The other columns were dropped from the table using pandas drop function.

From the dataset, we observe that items with equal ‘Ref’ values are from the same transactions. That is they were bought together in the same basket. So there’s the need to combine items in the same basket/transaction together into lists

In the screenshot below, with the aid of pandas groupby function, we were able to group items in the same transaction/basket together. The data was also prepared in a manner acceptable by the apriori algorithm.

1-hot encoding : This is the process of consolidating items into one transaction per row. This can be done manually like below. The colums represent unique items present in the input array, and rows represent the individual transactions.

The apriori algorithm only accepts integers. We need to replace all values ≥1 by 1 and <1 by 0. So here a function created to do that.

Looking at the shape of the dataset, there are about 1198 unique items being sold in the pharmacy. And also 700 distinct transactions.

As the data was just the sales data for one month, it was not large enough to show a high support among itemsets. We therefore decide to look at frequent itemsets that have a support of atleast 0.1%.

Apriori to select the most important itemsets

Now since we have identified the key itemsets, let us apply the association rules to learn the purchase behaviours.

Thus, we observe that:

  1. Abidec Drops & Emvit C. Drop are purchased together
  2. Aboniki Balm & Robb Bottle are also purchased together
  3. Ibuprofen Syrup and Allergin Syrup are purchased together as well.
  4. E.T.C

Based on the insights from market basket analysis , the pharmacist can organize his store such that items that go along with each other are placed near each other to help him find them easily and as such reduce the customers waiting time. It can also help him prioritize which items to buy more of into his pharmacy in other to not have to turn back customers and lose profit.

This story is published in Noteworthy, where 10,000+ readers come every day to learn about the people & ideas shaping the products we love.

Follow our publication to see more product & design stories featured by the Journal team.

--

--