Skip to content Skip to sidebar Skip to footer

Pandas Remove Null Values When To_json

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says: Note NaN‘s, NaT‘s and None will be converted to null and datetime obj

Solution 1:

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

json.dumps([{'index':index, **row.dropna().to_dict()} forindex,row in df.iterrows()])

Solution 2:

The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_json on this:

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaN values which is what you want to avoid:

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling list on the result of dropna will broadcast the result with the shape, like filling:

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'

Solution 3:

The most-upvoted answer uses iterrows(), which is not considered as best practice because of poor performances on large dataframes. The following solution only has similar performances on my use case, but maybe it will be better for other datasets ?

So one can use instead

df.agg(lambda x: x.dropna().to_dict(), axis=1)

to convert the dataframe into a series of dictionaries without null values. Then one can transform the series into a JSON list :

import json
json.dumps(list(df.agg(lambda x: x.dropna().to_dict(), axis=1)))

Solution 4:

I got the same problem and my solution is use json module instead of pd.DataFrame.to_json()

My solution is

  1. drop the NaN value when converting DataFrame to dict, and then
  2. convert dict to json using json.dumps()

Here is the code:

import pandas as pd
import json
from pandas import compat

defto_dict_dropna(df):
   return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(df)}

json.dumps(to_dict_dropna(df))

Post a Comment for "Pandas Remove Null Values When To_json"