-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Closed
Labels
Milestone
Description
Code in groupby.generic.DataFrameGroupBy._transform_general:
pandas/pandas/core/groupby/generic.py
Lines 1299 to 1309 in b3e3352
| for name, group in gen: | |
| object.__setattr__(group, "name", name) | |
| # Try slow path and fast path. | |
| try: | |
| path, res = self._choose_path(fast_path, slow_path, group) | |
| except TypeError: | |
| return self._transform_item_by_item(obj, fast_path) | |
| except ValueError as err: | |
| msg = "transform must return a scalar value for each group" | |
| raise ValueError(msg) from err |
This is calling _choose_path for every group, which in turn calls both the slow_path and the fast_path to determine if the fast path can be used. Indeed, running the code (from #41584):
df = pd.DataFrame({
'x': ['a', 'b', 'c', 'd'],
'y': [5, 6, 7, 8],
'g': [1, 2, 3, 3]
})
def myfirst(c):
return c.iloc[0]
print(df.groupby('g').transform(myfirst))
shows myfirst gets called 9 times - 3 times with columns x, 3 times with column y, and three times with the DataFrame consisting of x and y.
Should we just be calling choose_path on the first group to determine which can be used?