Skip to content

GroupBy aggregate: Must produce aggregated value #24016

@digital-thinking

Description

@digital-thinking

I want to aggregate my dataframe by id and this is not the first time I came across this error, which is basically happening because it seems not to be allowed to return a numpy array in the aggregation function.
There are dozens of use cases, where rows are aggregated to numpy arrays and wrapping it with a list does lead to several other problems, like the datatype of the hole dataframe is of type “object” after transformation, even if every column had the same type. When feeding such a dataframe into keras it is a mess to get the data into the correct format.

groupedDf = df[['id','vec','label']].groupby('id', as_index=False).agg(
                                        {'label':'mean',
                                        'vec': lambda x: return_some_numpy_array(x) })

When using list() as a wrapper it works but it would be more comfortable to just use the numpy array without having to unwrap the list.

Related code:
generic.py[908]

if isinstance(output, (Series, Index, np.ndarray)):
   raise Exception('Must produce aggregated value')

Maybe anyone has another idea to get over this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions