Open
Description
from SO
Thinking that this is a nice general idiom that might deserve a function:
df.loc[df.groupby(level='host').idxmax()['no']]
maybe:
a) df.groupby(level='host').idxmax(loc='no')
? (e.g. have the idxmin/idxmin) functions take a 'loc' argument. or is this bending the api a big much?
b) df.groupby(level='host').loc('no').idxmax()
c) df.groupby(level='hist').loc[lambda x: x.idxmax()['no']]
d) df.groupby(level='hist').loc('no','idxmax')
df_logfile = pd.DataFrame({
'host' : ['this.com', 'this.com', 'this.com', 'that.com', 'other.net',
'other.net', 'other.net'],
'service' : ['mail', 'mail', 'web', 'mail', 'mail', 'web', 'web' ] })
df = df_logfile.groupby(['host','service'])['service'].agg({'no':'count'})
mask = df.groupby(level=0).agg('idxmax')
df_count = df.loc[mask['no']]
df_count = df_count.reset_index()
# yields
host service no
0 other.net web 2
1 that.com mail 1
2 this.com mail 2