Skip to content

Commit e554cd7

Browse files
committed
using assign() + query() functions, changed SQL query for RANK() and corresponding Pandas expression
1 parent 6a4522c commit e554cd7

File tree

1 file changed

+25
-15
lines changed

1 file changed

+25
-15
lines changed

doc/source/comparison_with_sql.rst

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -407,39 +407,49 @@ Let's add a helper column: `RN` (Row Number)
407407

408408
.. ipython:: python
409409
410-
tips['rn'] = (tips.sort_values(['total_bill'], ascending=False)
411-
.groupby(['day'])
412-
.cumcount() + 1
413-
)
414-
tips.loc[tips['rn'] < 3].sort_values(['day','rn'])
410+
(tips.assign(rn=tips.sort_values(['total_bill'], ascending=False)
411+
.groupby(['day'])
412+
.cumcount() + 1)
413+
.query('rn < 3')
414+
.sort_values(['day','rn'])
415+
)
415416
416417
the same using `rank(method='first')` function
417418

418419
.. ipython:: python
419420
420-
tips['rnk'] = (tips.groupby(['day'])['total_bill']
421-
.rank(method='first', ascending=False)
422-
)
423-
tips.loc[tips['rnk'] < 3].sort_values(['day','rnk'])
421+
(tips.assign(rnk=tips.groupby(['day'])['total_bill']
422+
.rank(method='first', ascending=False))
423+
.query('rnk < 3')
424+
.sort_values(['day','rnk'])
425+
)
424426
425427
.. code-block:: sql
426428
427429
-- Oracle's RANK() analytic function
428430
SELECT * FROM (
429431
SELECT
430432
t.*,
431-
RANK() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnk
433+
RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk
432434
FROM tips t
435+
WHERE tip < 2
433436
)
434437
WHERE rnk < 3
435-
ORDER BY day, rn;
438+
ORDER BY sex, rnk;
439+
440+
Let's find tips with (rank < 3) per gender group for (tips < 2).
441+
Notice that when using ``rank(method='min')`` function
442+
`rnk_min` remains the same for the same `tip`
443+
(as Oracle's RANK() function)
436444

437445
.. ipython:: python
438446
439-
tips['rnk_min'] = (tips.groupby(['day'])['total_bill']
440-
.rank(method='min', ascending=False)
441-
)
442-
tips.loc[tips['rnk_min'] < 3].sort_values(['day','rnk_min'])
447+
(tips[tips['tip'] < 2]
448+
.assign(rnk_min=tips.groupby(['sex'])['tip']
449+
.rank(method='min'))
450+
.query('rnk_min < 3')
451+
.sort_values(['sex','rnk_min'])
452+
)
443453
444454
445455
UPDATE

0 commit comments

Comments
 (0)