Skip to content Skip to sidebar Skip to footer

Insert Rows Into Pandas Dataframe While Maintaining Column Data Types

What's the best way to insert new rows into an existing pandas DataFrame while maintaining column data types and, at the same time, giving user-defined fill values for columns that

Solution 1:

As you found, since NaN is a float, adding NaN to a series may cause it to be either upcasted to float or converted to object. You are right in determining this is not a desirable outcome.

There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append accepts a dict argument.

In Python 3.6, you can use the syntax {**d1, **d2} to combine two dictionaries with preference for the second.

default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False}

row = {'name': 'Cindy', 'age': 42}

df = df.append({**default, **row}, ignore_index=True)

print(df)

   age  has_children   name  weight
045True    Bob   143.2140True    Sue   130.2210False    Tom    34.9342False  Cindy     0.0print(df.dtypes)

age               int64
has_children       bool
name             object
weight          float64
dtype: object

Solution 2:

It's because, NaN value is a float, but True and False are bool. There are mixed dtypes in one column, so Pandas will automatically convert it into object.

Another instance of this is, if you have a column with all integer values and append a value with float, then pandas change entire column to float by adding '.0' to the remaining values.


Edit

Based on comments, Another hacky way to convert object to bool dtype.

df = pandas.DataFrame({
    'name': ['Bob', 'Sue', 'Tom'],
    'age': [45, 40, 10],
    'weight': [143.2, 130.2, 34.9],
    'has_children': [True, True, False]
})
row = {'name': 'Cindy', 'age': 12}
df = df.append(row, ignore_index=True)
df['has_children'] = df['has_children'].fillna(False).astype('bool')

Now the new dataframe looks like this :

    age has_children    name    weight
 045True             Bob    143.2140True             Sue    130.2210False            Tom    34.9312False            Cindy  NaN

Post a Comment for "Insert Rows Into Pandas Dataframe While Maintaining Column Data Types"