Parsing Csv In Python
I'm trying to parse a csv file in python and print the sum of order_total for each day. Below is the sample csv file order_total created_datetime
Solution 1:
I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas
):
import pandas as pd
df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)
Gives the following result:
order_totalcreated_datetime2015-06-01 261.162015-06-02 415.762015-06-03 132.92
Less codes, and probably lower algorithm complexity.
Solution 2:
This one should do the job.
csv
module has DictReader
, in which you can include fieldnames
so instead of accessing columns by index
(row[0]
), you can predefine columns names(row['date']
).
from datetime import datetime, timedelta
from collections import defaultdict
defsum_orders_test(self, start_date, end_date):
FIELDNAMES = ['orders', 'date']
sum_of_orders = defaultdict(int)
initial_date = datetime.strptime(start_date, '%Y-%m-%d').date()
final_date = datetime.strptime(end_date, '%Y-%m-%d').date()
day = timedelta(days=1)
withopen("file1.csv", 'r') as data_file:
next(data_file) # Skip the headers
reader = csv.DictReader(data_file, fieldnames=FIELDNAMES)
if initial_date <= final_date:
for row in reader:
ifstr(initial_date) in row['date']:
sum_of_orders[str(initial_date)] += int(row['orders'])
else:
initial_date += day
return sum_of_orders
Solution 3:
You might have a .csv
file extension, but your file seems to be a tab separated
file actually.
You can load it as pandas dataframe
but specifying the separator.
import pandas as pd
data = pd.read_csv('file.csv', sep='\t')
Then split the datetime column into date and time
data = pd.DataFrame(data.created_datetime.str.split(' ',1).tolist(),
columns = ['date','time'])
Then for each unique date, compute it's order_total
sum
for i indata.date.unique():
print i, data[data['date'] == i].order_total.sum()
Post a Comment for "Parsing Csv In Python"