Skip to content

API/ENH: idxmax groupby API #8717

Open
Open
@jreback

Description

@jreback

from SO

Thinking that this is a nice general idiom that might deserve a function:

df.loc[df.groupby(level='host').idxmax()['no']]

maybe:
a) df.groupby(level='host').idxmax(loc='no') ? (e.g. have the idxmin/idxmin) functions take a 'loc' argument. or is this bending the api a big much?
b) df.groupby(level='host').loc('no').idxmax()
c) df.groupby(level='hist').loc[lambda x: x.idxmax()['no']]
d) df.groupby(level='hist').loc('no','idxmax')

df_logfile = pd.DataFrame({ 
    'host' : ['this.com', 'this.com', 'this.com', 'that.com', 'other.net', 
              'other.net', 'other.net'],
    'service' : ['mail', 'mail', 'web', 'mail', 'mail', 'web', 'web' ] })

df = df_logfile.groupby(['host','service'])['service'].agg({'no':'count'})
mask = df.groupby(level=0).agg('idxmax')
df_count = df.loc[mask['no']]
df_count = df_count.reset_index()

# yields
        host service  no
0  other.net     web   2
1   that.com    mail   1
2   this.com    mail   2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions