Skip to content Skip to sidebar Skip to footer

Parsing Csv In Python

I'm trying to parse a csv file in python and print the sum of order_total for each day. Below is the sample csv file order_total created_datetime

Solution 1:

I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas):

import pandas as pd

df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)

Gives the following result:

order_totalcreated_datetime2015-06-01             261.162015-06-02             415.762015-06-03             132.92

Less codes, and probably lower algorithm complexity.

Solution 2:

This one should do the job.

csv module has DictReader, in which you can include fieldnames so instead of accessing columns by index (row[0]), you can predefine columns names(row['date']).

from datetime import datetime, timedelta
from collections import defaultdict


defsum_orders_test(self, start_date, end_date):
    FIELDNAMES = ['orders', 'date']
    sum_of_orders = defaultdict(int)

    initial_date = datetime.strptime(start_date, '%Y-%m-%d').date()
    final_date = datetime.strptime(end_date, '%Y-%m-%d').date()
    day = timedelta(days=1)
    withopen("file1.csv", 'r') as data_file:
        next(data_file)  # Skip the headers
        reader = csv.DictReader(data_file, fieldnames=FIELDNAMES)
        if initial_date <= final_date:
            for row in reader:
                ifstr(initial_date) in row['date']:
                    sum_of_orders[str(initial_date)] += int(row['orders'])
                else:
                    initial_date += day
    return sum_of_orders

Solution 3:

You might have a .csv file extension, but your file seems to be a tab separated file actually.

You can load it as pandas dataframe but specifying the separator.

import pandas as pd
data = pd.read_csv('file.csv', sep='\t')

Then split the datetime column into date and time

data = pd.DataFrame(data.created_datetime.str.split(' ',1).tolist(),
                               columns = ['date','time'])

Then for each unique date, compute it's order_total sum

for i indata.date.unique():
    print i, data[data['date'] == i].order_total.sum()

Post a Comment for "Parsing Csv In Python"