Numpy: How To Add A Column To An Existing Structured Array?
Solution 1:
You have to create a new dtype that contains the new field.
For example, here's a
:
In[86]: aOut[86]:
array([(1, [-112.01268501699997, 40.64249414272372]),
(2, [-111.86145708699996, 40.4945008710162])],
dtype=[('i', '<i8'), ('loc', '<f8', (2,))])
a.dtype.descr
is [('i', '<i8'), ('loc', '<f8', (2,))]
; i.e. a list of field types. We'll create a new dtype by adding ('USNG', 'S100')
to the end of that list:
In [87]: new_dt = np.dtype(a.dtype.descr + [('USNG', 'S100')])
Now create a new structured array, b
. I used zeros
here, so the string fields will start out with the value ''
. You could also use empty
. The strings will then contain garbage, but that won't matter if you immediately assign values to them.
In [88]: b = np.zeros(a.shape, dtype=new_dt)
Copy over the existing data from a
to b
:
In [89]: b['i'] = a['i']
In [90]: b['loc'] = a['loc']
Here's b
now:
In[91]: bOut[91]:
array([(1, [-112.01268501699997, 40.64249414272372], ''),
(2, [-111.86145708699996, 40.4945008710162], '')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Fill in the new field with some data:
In [93]: b['USNG'] = ['FOO', 'BAR']
In [94]: b
Out[94]:
array([(1, [-112.01268501699997, 40.64249414272372], 'FOO'),
(2, [-111.86145708699996, 40.4945008710162], 'BAR')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Solution 2:
Have you tried using numpy's recfunctions?
import numpy.lib.recfunctionsas rfn
It has some very useful functions for structured arrays.
For your case, I think it could be accomplished with:
a = rfn.append_fields(a, 'USNG', np.empty(a.shape[0], dtype='|S100'), dtypes='|S100')
Tested here and it worked.
merge_arrays
As GMSL mentioned in the comments. It is possible to do that with rfn.merge_arrays like below:
a = np.array([(1, [-112.01268501699997, 40.64249414272372]),
(2, [-111.86145708699996, 40.4945008710162])],
dtype=[('i', '<i8'), ('loc', '<f8', (2,))])
a2 = np.full(a.shape[0], '', dtype=[('USNG', '|S100')])
a3 = rfn.merge_arrays((a, a2), flatten=True)
a3 will have the value:
array([(1, [-112.01268502, 40.64249414], b''),
(2, [-111.86145709, 40.49450087], b'')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Solution 3:
- If pandas is an option, it makes adding a column to a
recarray
, much easier.- Additionally, the data will be in a form that's easily analyzed
- numpy is a pandas dependency, and makes many operations easier.
- Also see How to add a column to numpy recarry as another example.
- Read the current
recarray
withpandas.DataFrame
orpandas.DataFrame.from_records
. - Add the new column of data to the dataframe
- Export the dataframe to a
recarray
withpandas.DataFrame.to_records
import pandas as pd
import numpy as np
# current recarray
data = np.rec.array([(1, list([-112.01268501699997, 40.64249414272372])), (2, list([-111.86145708699996, 40.4945008710162]))], dtype=[('i', '<i8'), ('loc', 'O')])
# create dataframe
df = pd.DataFrame(data)
# display(df)
i loc
01 [-112.01268501699997, 40.64249414272372]
12 [-111.86145708699996, 40.4945008710162]
# add new column
df['USNG'] = ['Note 1', 'Note 2']
# display(df)
i loc USNG
01 [-112.01268501699997, 40.64249414272372] Note 112 [-111.86145708699996, 40.4945008710162] Note 2# write the dataframe to recarray
data = df.to_records(index=False)
print(data)
[out]:
rec.array([(1, list([-112.01268501699997, 40.64249414272372]), 'Note 1'),
(2, list([-111.86145708699996, 40.4945008710162]), 'Note 2')],
dtype=[('i', '<i8'), ('loc', 'O'), ('USNG', 'O')])
Solution 4:
The question is precisely: "Any suggestions on why this is happening?"
Fundamentally, this is a bug--- it's been an open ticket at numpy since 2012.
Solution 5:
Tonsic mentioned the recfunctions by import numpy.lib.recfunctions as rfn
. In this case, a simpler recfunction function that would work for you is rfn.merge_arrays()
(docs).
Post a Comment for "Numpy: How To Add A Column To An Existing Structured Array?"