Skip to content Skip to sidebar Skip to footer

Numpy: How To Add A Column To An Existing Structured Array?

I have a starting array such as: [(1, [-112.01268501699997, 40.64249414272372]) (2, [-111.86145708699996, 40.4945008710162])] The first column is an int and the second is a list

Solution 1:

You have to create a new dtype that contains the new field.

For example, here's a:

In[86]: aOut[86]: 
array([(1, [-112.01268501699997, 40.64249414272372]),
       (2, [-111.86145708699996, 40.4945008710162])], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,))])

a.dtype.descr is [('i', '<i8'), ('loc', '<f8', (2,))]; i.e. a list of field types. We'll create a new dtype by adding ('USNG', 'S100') to the end of that list:

In [87]: new_dt = np.dtype(a.dtype.descr + [('USNG', 'S100')])

Now create a new structured array, b. I used zeros here, so the string fields will start out with the value ''. You could also use empty. The strings will then contain garbage, but that won't matter if you immediately assign values to them.

In [88]: b = np.zeros(a.shape, dtype=new_dt)

Copy over the existing data from a to b:

In [89]: b['i'] = a['i']

In [90]: b['loc'] = a['loc']

Here's b now:

In[91]: bOut[91]: 
array([(1, [-112.01268501699997, 40.64249414272372], ''),
       (2, [-111.86145708699996, 40.4945008710162], '')], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])

Fill in the new field with some data:

In [93]: b['USNG'] = ['FOO', 'BAR']

In [94]: b
Out[94]: 
array([(1, [-112.01268501699997, 40.64249414272372], 'FOO'),
       (2, [-111.86145708699996, 40.4945008710162], 'BAR')], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])

Solution 2:

Have you tried using numpy's recfunctions?

import numpy.lib.recfunctionsas rfn

It has some very useful functions for structured arrays.

For your case, I think it could be accomplished with:

a = rfn.append_fields(a, 'USNG', np.empty(a.shape[0], dtype='|S100'), dtypes='|S100')

Tested here and it worked.


merge_arrays

As GMSL mentioned in the comments. It is possible to do that with rfn.merge_arrays like below:

a = np.array([(1, [-112.01268501699997, 40.64249414272372]),
       (2, [-111.86145708699996, 40.4945008710162])], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,))])
a2 = np.full(a.shape[0], '', dtype=[('USNG', '|S100')])
a3 = rfn.merge_arrays((a, a2), flatten=True)

a3 will have the value:

array([(1, [-112.01268502,   40.64249414], b''),
       (2, [-111.86145709,   40.49450087], b'')],
      dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])

Solution 3:

  • If pandas is an option, it makes adding a column to a recarray, much easier.
  1. Read the current recarray with pandas.DataFrame or pandas.DataFrame.from_records.
  2. Add the new column of data to the dataframe
  3. Export the dataframe to a recarray with pandas.DataFrame.to_records
import pandas as pd
import numpy as np

# current recarray
data = np.rec.array([(1, list([-112.01268501699997, 40.64249414272372])), (2, list([-111.86145708699996, 40.4945008710162]))], dtype=[('i', '<i8'), ('loc', 'O')])

# create dataframe
df = pd.DataFrame(data)

# display(df)
   i                                       loc
01  [-112.01268501699997, 40.64249414272372]
12   [-111.86145708699996, 40.4945008710162]

# add new column
df['USNG'] = ['Note 1', 'Note 2']

# display(df)
   i                                       loc    USNG
01  [-112.01268501699997, 40.64249414272372]  Note 112   [-111.86145708699996, 40.4945008710162]  Note 2# write the dataframe to recarray
data = df.to_records(index=False)

print(data)
[out]:
rec.array([(1, list([-112.01268501699997, 40.64249414272372]), 'Note 1'),
           (2, list([-111.86145708699996, 40.4945008710162]), 'Note 2')],
          dtype=[('i', '<i8'), ('loc', 'O'), ('USNG', 'O')])

Solution 4:

The question is precisely: "Any suggestions on why this is happening?"

Fundamentally, this is a bug--- it's been an open ticket at numpy since 2012.

Solution 5:

Tonsic mentioned the recfunctions by import numpy.lib.recfunctions as rfn. In this case, a simpler recfunction function that would work for you is rfn.merge_arrays() (docs).

Post a Comment for "Numpy: How To Add A Column To An Existing Structured Array?"