Skip to content Skip to sidebar Skip to footer

Apriori Algorithm Not Showing Result

I am using Python for market basket analysis. When I am executing this code, it only showing the column name without any result. frequent_tr = apriori(data_tr, min_support=0.05)

Solution 1:

Your csv has 142.155 rows and 142.103 unique transaction_id. That means that that only 52 of your transaction_id have more than one service_type... how do you intend to apply an apriori model with only 52 associations? Could it be that you are intending to do an apriori not based in the transaction level but on the geohash_user level?

Beside of that, and assuming you want to go with the user level analysis, quite not sure why you need to use TransformationEncoder.

I guess that what you are trying to achieve is your dataframe to have a 1 (True) if the value is higher than 0 and 0 (False) otherwise. At least, for using apriori that is what you are expected to use as input, because it doesn't mind whether in the same transaction there were 1 or 5 units of the same type.

defencode_units(x):
    if x<=0:
        return0elif x >= 1:
        return1


data = pd.read_csv('Dataset - Transaction.csv')

data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)

data_tr_encoded2 = data_tr.applymap(encode_units)

data_tr_encoded_filt = data_tr_encoded2[(data_tr_encoded2 > 0).sum(axis=1) >= 2] #we only need users that have more than 1 service in order to get association rules

frequent_tr_encoded = apriori(data_tr_encoded_filt, min_support=0.05, use_colnames = True)
    support itemsets
00.054093    (Aircond Repair)
10.186669    (Aircond Servicing)
20.090622    (Electrical Wiring / Power Point)
30.078008    (Local Moving - Budget Lorry)
40.060556    (Painting)
50.170405    (Plumbing Repair)
60.054093    (Aircond Repair)
70.186669    (Aircond Servicing)
80.090622    (Electrical Wiring / Power Point)
90.078008    (Local Moving - Budget Lorry)
100.060556    (Painting)
110.170405    (Plumbing Repair)
120.054093    (Aircond Repair)
130.186669    (Aircond Servicing)
140.090622    (Electrical Wiring / Power Point)
150.078008    (Local Moving - Budget Lorry)
160.060556    (Painting)
170.170405    (Plumbing Repair)

Solution 2:

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
data = pd.read_csv('Dataset - Transaction.csv')
data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)

Output (Same output of Mr.CarlosSR)

support itemsets
00.054093    (Aircond Repair)
10.186669    (Aircond Servicing)
20.090622    (Electrical Wiring / Power Point)
30.078008    (Local Moving - Budget Lorry)
40.060556    (Painting)
50.170405    (Plumbing Repair)
60.054093    (Aircond Repair)
70.186669    (Aircond Servicing)
80.090622    (Electrical Wiring / Power Point)
90.078008    (Local Moving - Budget Lorry)
100.060556    (Painting)
110.170405    (Plumbing Repair)
120.054093    (Aircond Repair)
130.186669    (Aircond Servicing)
140.090622    (Electrical Wiring / Power Point)
150.078008    (Local Moving - Budget Lorry)
160.060556    (Painting)
170.170405    (Plumbing Repair)

Second

data_tr = data.groupby(['transaction_id', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('transaction_id').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)

Output

    support     itemsets
00.131081    (Aircond Servicing)
10.058486    (Electrical Wiring / Power Point)
20.050062    (Local Moving - Budget Lorry)
30.114593    (Plumbing Repair)

EDIT

The allowed values for a DataFrame by apriori function are True, False, 0, 1

Filtering SUM more than 2 along axis 1

data_tr_list = data_tr_list[data_tr_list.sum(axis=1) >= 2]

Post a Comment for "Apriori Algorithm Not Showing Result"