-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
I want to aggregate my dataframe by id and this is not the first time I came across this error, which is basically happening because it seems not to be allowed to return a numpy array in the aggregation function.
There are dozens of use cases, where rows are aggregated to numpy arrays and wrapping it with a list does lead to several other problems, like the datatype of the hole dataframe is of type “object” after transformation, even if every column had the same type. When feeding such a dataframe into keras it is a mess to get the data into the correct format.
groupedDf = df[['id','vec','label']].groupby('id', as_index=False).agg(
{'label':'mean',
'vec': lambda x: return_some_numpy_array(x) })
When using list()
as a wrapper it works but it would be more comfortable to just use the numpy array without having to unwrap the list.
Related code:
generic.py[908]
if isinstance(output, (Series, Index, np.ndarray)):
raise Exception('Must produce aggregated value')
Maybe anyone has another idea to get over this issue?