@@ -57,6 +57,10 @@ previous row in the DataFrame.
5757 Setting Parameters
5858------------------
5959
60+
61+ Ordering
62+ ^^^^^^^^
63+
6064You can control the order in which rows are processed by window functions by providing
6165a list of ``order_by `` functions for the ``order_by `` parameter.
6266
@@ -66,28 +70,114 @@ a list of ``order_by`` functions for the ``order_by`` parameter.
6670 col(' "Name"' ),
6771 col(' "Attack"' ),
6872 col(' "Type 1"' ),
69- f.rank()
70- .partition_by(col(' "Type 1"' ))
71- .order_by(col(' "Attack"' ).sort(ascending = True ))
72- .build()
73- .alias(" rank" ),
74- ).sort(col(' "Type 1"' ).sort(), col(' "Attack"' ).sort())
73+ f.rank(
74+ partition_by = [col(' "Type 1"' )],
75+ order_by = [col(' "Attack"' ).sort(ascending = True )],
76+ ).alias(" rank" ),
77+ ).sort(col(' "Type 1"' ), col(' "Attack"' ))
78+
79+ Partitions
80+ ^^^^^^^^^^
81+
82+ A window function can take a list of ``partition_by `` columns similar to an
83+ :ref: `Aggregation Function<aggregation> `. This will cause the window values to be evaluated
84+ independently for each of the partitions. In the example above, we found the rank of each
85+ Pokemon per ``Type 1 `` partitions. We can see the first couple of each partition if we do
86+ the following:
87+
88+ .. ipython :: python
89+
90+ df.select(
91+ col(' "Name"' ),
92+ col(' "Attack"' ),
93+ col(' "Type 1"' ),
94+ f.rank(
95+ partition_by = [col(' "Type 1"' )],
96+ order_by = [col(' "Attack"' ).sort(ascending = True )],
97+ ).alias(" rank" ),
98+ ).filter(col(" rank" ) < lit(3 )).sort(col(' "Type 1"' ), col(" rank" ))
99+
100+ Window Frame
101+ ^^^^^^^^^^^^
102+
103+ When using aggregate functions, the Window Frame of defines the rows over which it operates.
104+ If you do not specify a Window Frame, the frame will be set depending on the following
105+ criteria.
106+
107+ * If an ``order_by `` clause is set, the default window frame is defined as the rows between
108+ unbounded preceeding and the current row.
109+ * If an ``order_by `` is not set, the default frame is defined as the rows betwene unbounded
110+ and unbounded following (the entire partition).
111+
112+ Window Frames are defined by three parameters: unit type, starting bound, and ending bound.
113+
114+ The unit types available are:
75115
76- Window Functions can be configured using a builder approach to set a few parameters.
77- To create a builder you simply need to call any one of these functions
116+ * Rows: The starting and ending boundaries are defined by the number of rows relative to the
117+ current row.
118+ * Range: When using Range, the ``order_by `` clause must have exactly one term. The boundaries
119+ are defined bow how close the rows are to the value of the expression in the ``order_by ``
120+ parameter.
121+ * Groups: A "group" is the set of all rows that have equivalent values for all terms in the
122+ ``order_by `` clause.
78123
79- - :py:func: `datafusion.expr.Expr.order_by ` to set the window ordering.
80- - :py:func: `datafusion.expr.Expr.null_treatment ` to set how ``null `` values should be handled.
81- - :py:func: `datafusion.expr.Expr.partition_by ` to set the partitions for processing.
82- - :py:func: `datafusion.expr.Expr.window_frame ` to set boundary of operation.
124+ In this example we perform a "rolling average" of the speed of the current Pokemon and the
125+ two preceeding rows.
83126
84- After these parameters are set, you must call ``build() `` on the resultant object to get an
85- expression as shown in the example above.
127+ .. ipython :: python
128+
129+ from datafusion.expr import WindowFrame
130+
131+ df.select(
132+ col(' "Name"' ),
133+ col(' "Speed"' ),
134+ f.window(" avg" ,
135+ [col(' "Speed"' )],
136+ order_by = [col(' "Speed"' )],
137+ window_frame = WindowFrame(" rows" , 2 , 0 )
138+ ).alias(" Previous Speed" )
139+ )
140+
141+ Null Treatment
142+ ^^^^^^^^^^^^^^
143+
144+ When using aggregate functions as window functions, it is often useful to specify how null values
145+ should be treated. In order to do this you need to use the builder function. In future releases
146+ we expect this to be simplified in the interface.
147+
148+ One common usage for handling nulls is the case where you want to find the last value up to the
149+ current row. In the following example we demonstrate how setting the null treatment to ignore
150+ nulls will fill in with the value of the most recent non-null row. To do this, we also will set
151+ the window frame so that we only process up to the current row.
152+
153+ In this example, we filter down to one specific type of Pokemon that does have some entries in
154+ it's ``Type 2 `` column that are null.
155+
156+ .. ipython :: python
157+
158+ from datafusion.common import NullTreatment
159+
160+ df.filter(col(' "Type 1"' ) == lit(" Bug" )).select(
161+ ' "Name"' ,
162+ ' "Type 2"' ,
163+ f.window(" last_value" , [col(' "Type 2"' )])
164+ .window_frame(WindowFrame(" rows" , None , 0 ))
165+ .order_by(col(' "Speed"' ))
166+ .null_treatment(NullTreatment.IGNORE_NULLS )
167+ .build()
168+ .alias(" last_wo_null" ),
169+ f.window(" last_value" , [col(' "Type 2"' )])
170+ .window_frame(WindowFrame(" rows" , None , 0 ))
171+ .order_by(col(' "Speed"' ))
172+ .null_treatment(NullTreatment.RESPECT_NULLS )
173+ .build()
174+ .alias(" last_with_null" )
175+ )
86176
87177 Aggregate Functions
88178-------------------
89179
90- You can use any :ref: `Aggregation Function<aggregation> ` as a window function. Currently
180+ You can use any :ref: `Aggregation Function<aggregation> ` as a window function. Currently
91181aggregate functions must use the deprecated
92182:py:func: `datafusion.functions.window ` API but this should be resolved in
93183DataFusion 42.0 (`Issue Link <https://github.com/apache/datafusion-python/issues/833 >`_). Here
0 commit comments