Numpy: How To Add A Column To An Existing Structured Array?
Solution 1:
You have to create a new dtype that contains the new field.
For example, here's a:
In[86]: aOut[86]:
array([(1, [-112.01268501699997, 40.64249414272372]),
(2, [-111.86145708699996, 40.4945008710162])],
dtype=[('i', '<i8'), ('loc', '<f8', (2,))])
a.dtype.descr is [('i', '<i8'), ('loc', '<f8', (2,))]; i.e. a list of field types. We'll create a new dtype by adding ('USNG', 'S100') to the end of that list:
In [87]: new_dt = np.dtype(a.dtype.descr + [('USNG', 'S100')])
Now create a new structured array, b. I used zeros here, so the string fields will start out with the value ''. You could also use empty. The strings will then contain garbage, but that won't matter if you immediately assign values to them.
In [88]: b = np.zeros(a.shape, dtype=new_dt)
Copy over the existing data from a to b:
In [89]: b['i'] = a['i']
In [90]: b['loc'] = a['loc']
Here's b now:
In[91]: bOut[91]:
array([(1, [-112.01268501699997, 40.64249414272372], ''),
(2, [-111.86145708699996, 40.4945008710162], '')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Fill in the new field with some data:
In [93]: b['USNG'] = ['FOO', 'BAR']
In [94]: b
Out[94]:
array([(1, [-112.01268501699997, 40.64249414272372], 'FOO'),
(2, [-111.86145708699996, 40.4945008710162], 'BAR')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Solution 2:
Have you tried using numpy's recfunctions?
import numpy.lib.recfunctionsas rfn
It has some very useful functions for structured arrays.
For your case, I think it could be accomplished with:
a = rfn.append_fields(a, 'USNG', np.empty(a.shape[0], dtype='|S100'), dtypes='|S100')
Tested here and it worked.
merge_arrays
As GMSL mentioned in the comments. It is possible to do that with rfn.merge_arrays like below:
a = np.array([(1, [-112.01268501699997, 40.64249414272372]),
(2, [-111.86145708699996, 40.4945008710162])],
dtype=[('i', '<i8'), ('loc', '<f8', (2,))])
a2 = np.full(a.shape[0], '', dtype=[('USNG', '|S100')])
a3 = rfn.merge_arrays((a, a2), flatten=True)
a3 will have the value:
array([(1, [-112.01268502, 40.64249414], b''),
(2, [-111.86145709, 40.49450087], b'')],
dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])
Solution 3:
- If pandas is an option, it makes adding a column to a
recarray, much easier.- Additionally, the data will be in a form that's easily analyzed
- numpy is a pandas dependency, and makes many operations easier.
- Also see How to add a column to numpy recarry as another example.
- Read the current
recarraywithpandas.DataFrameorpandas.DataFrame.from_records. - Add the new column of data to the dataframe
- Export the dataframe to a
recarraywithpandas.DataFrame.to_records
import pandas as pd
import numpy as np
# current recarray
data = np.rec.array([(1, list([-112.01268501699997, 40.64249414272372])), (2, list([-111.86145708699996, 40.4945008710162]))], dtype=[('i', '<i8'), ('loc', 'O')])
# create dataframe
df = pd.DataFrame(data)
# display(df)
i loc
01 [-112.01268501699997, 40.64249414272372]
12 [-111.86145708699996, 40.4945008710162]
# add new column
df['USNG'] = ['Note 1', 'Note 2']
# display(df)
i loc USNG
01 [-112.01268501699997, 40.64249414272372] Note 112 [-111.86145708699996, 40.4945008710162] Note 2# write the dataframe to recarray
data = df.to_records(index=False)
print(data)
[out]:
rec.array([(1, list([-112.01268501699997, 40.64249414272372]), 'Note 1'),
(2, list([-111.86145708699996, 40.4945008710162]), 'Note 2')],
dtype=[('i', '<i8'), ('loc', 'O'), ('USNG', 'O')])
Solution 4:
The question is precisely: "Any suggestions on why this is happening?"
Fundamentally, this is a bug--- it's been an open ticket at numpy since 2012.
Solution 5:
Tonsic mentioned the recfunctions by import numpy.lib.recfunctions as rfn. In this case, a simpler recfunction function that would work for you is rfn.merge_arrays() (docs).
Post a Comment for "Numpy: How To Add A Column To An Existing Structured Array?"