Apriori Algorithm Not Showing Result
Solution 1:
Your csv has 142.155 rows and 142.103 unique transaction_id. That means that that only 52 of your transaction_id have more than one service_type... how do you intend to apply an apriori model with only 52 associations? Could it be that you are intending to do an apriori not based in the transaction level but on the geohash_user level?
Beside of that, and assuming you want to go with the user level analysis, quite not sure why you need to use TransformationEncoder.
I guess that what you are trying to achieve is your dataframe to have a 1 (True) if the value is higher than 0 and 0 (False) otherwise. At least, for using apriori that is what you are expected to use as input, because it doesn't mind whether in the same transaction there were 1 or 5 units of the same type.
defencode_units(x):
if x<=0:
return0elif x >= 1:
return1
data = pd.read_csv('Dataset - Transaction.csv')
data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)
data_tr_encoded2 = data_tr.applymap(encode_units)
data_tr_encoded_filt = data_tr_encoded2[(data_tr_encoded2 > 0).sum(axis=1) >= 2] #we only need users that have more than 1 service in order to get association rules
frequent_tr_encoded = apriori(data_tr_encoded_filt, min_support=0.05, use_colnames = True)
support itemsets
00.054093 (Aircond Repair)
10.186669 (Aircond Servicing)
20.090622 (Electrical Wiring / Power Point)
30.078008 (Local Moving - Budget Lorry)
40.060556 (Painting)
50.170405 (Plumbing Repair)
60.054093 (Aircond Repair)
70.186669 (Aircond Servicing)
80.090622 (Electrical Wiring / Power Point)
90.078008 (Local Moving - Budget Lorry)
100.060556 (Painting)
110.170405 (Plumbing Repair)
120.054093 (Aircond Repair)
130.186669 (Aircond Servicing)
140.090622 (Electrical Wiring / Power Point)
150.078008 (Local Moving - Budget Lorry)
160.060556 (Painting)
170.170405 (Plumbing Repair)
Solution 2:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
data = pd.read_csv('Dataset - Transaction.csv')
data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)
Output (Same output of Mr.CarlosSR)
support itemsets
00.054093 (Aircond Repair)
10.186669 (Aircond Servicing)
20.090622 (Electrical Wiring / Power Point)
30.078008 (Local Moving - Budget Lorry)
40.060556 (Painting)
50.170405 (Plumbing Repair)
60.054093 (Aircond Repair)
70.186669 (Aircond Servicing)
80.090622 (Electrical Wiring / Power Point)
90.078008 (Local Moving - Budget Lorry)
100.060556 (Painting)
110.170405 (Plumbing Repair)
120.054093 (Aircond Repair)
130.186669 (Aircond Servicing)
140.090622 (Electrical Wiring / Power Point)
150.078008 (Local Moving - Budget Lorry)
160.060556 (Painting)
170.170405 (Plumbing Repair)
Second
data_tr = data.groupby(['transaction_id', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('transaction_id').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)
Output
support itemsets
00.131081 (Aircond Servicing)
10.058486 (Electrical Wiring / Power Point)
20.050062 (Local Moving - Budget Lorry)
30.114593 (Plumbing Repair)
EDIT
The allowed values for a DataFrame by apriori function are True, False, 0, 1
Filtering SUM more than 2 along axis 1
data_tr_list = data_tr_list[data_tr_list.sum(axis=1) >= 2]
Post a Comment for "Apriori Algorithm Not Showing Result"