Skip to content Skip to sidebar Skip to footer

Optimising Iteration And Substitution Over Large Dataset

I've made a post here, yet as I got no answer as per now I thought maybe to try it also here as I've found it relevant. I have the following code: import pandas as pd import nump

Solution 1:

Tested on entire file.

Here You go:

=^..^=

import pandas as pd
import numpy as np
import itertools

# Importing the data
df=pd.read_csv('./GPr_test.csv', sep=',',header=None)

# set new data frame
df2 = pd.DataFrame()
pd.options.display.max_colwidth = 200for index, row in df.iterrows():
    # clean data
    clean_list = [x for x inlist(row.values) ifstr(x) != 'nan']
    # create combinations
    items_combinations = list(itertools.combinations(clean_list, 2))
    # create set combinations
    joint_items_combinations = [':'.join(x) for x in items_combinations]

    # collect rest of item names# handle firs elementif index == 0:
        additional_names = list(df.loc[1].values)
        additional_names = [x for x in additional_names ifstr(x) != 'nan']
    else:
        additional_names = list(df.loc[index-1].values)
        additional_names = [x for x in additional_names ifstr(x) != 'nan']

    # get set data
    result = []
    for combination, joint_combination inzip(items_combinations, joint_items_combinations):
        set_data = [item for item in clean_list if item notin combination] + [joint_combination]
        result.append((set_data, additional_names))

    # add data to data frame
    data = pd.DataFrame({"result": result})
    df2 = df2.append(data)


df2 = df2.reset_index().drop(columns=['index'])

For rows:

chicken cinnamon    ginger  onion   soy_sauce
cardamom    coconut pumpkin

Output:

                                                                      result
0   ([ginger, onion, soy_sauce, chicken:cinnamon], [cardamom, coconut, pumpkin])
1   ([cinnamon, onion, soy_sauce, chicken:ginger], [cardamom, coconut, pumpkin])
2   ([cinnamon, ginger, soy_sauce, chicken:onion], [cardamom, coconut, pumpkin])
3   ([cinnamon, ginger, onion, chicken:soy_sauce], [cardamom, coconut, pumpkin])
4   ([chicken, onion, soy_sauce, cinnamon:ginger], [cardamom, coconut, pumpkin])
5   ([chicken, ginger, soy_sauce, cinnamon:onion], [cardamom, coconut, pumpkin])
6   ([chicken, ginger, onion, cinnamon:soy_sauce], [cardamom, coconut, pumpkin])
7   ([chicken, cinnamon, soy_sauce, ginger:onion], [cardamom, coconut, pumpkin])
8   ([chicken, cinnamon, onion, ginger:soy_sauce], [cardamom, coconut, pumpkin])
9   ([chicken, cinnamon, ginger, onion:soy_sauce], [cardamom, coconut, pumpkin])
10  ([pumpkin, cardamom:coconut], [chicken, cinnamon, ginger, onion, soy_sauce])
11  ([coconut, cardamom:pumpkin], [chicken, cinnamon, ginger, onion, soy_sauce])
12  ([cardamom, coconut:pumpkin], [chicken, cinnamon, ginger, onion, soy_sauce])

Post a Comment for "Optimising Iteration And Substitution Over Large Dataset"