Skip to content

Commit 2d64fb5

Browse files
committed
ENH add cython tutorial
1 parent 111ff2b commit 2d64fb5

File tree

2 files changed

+224
-2
lines changed

2 files changed

+224
-2
lines changed

doc/source/cython.rst

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
.. _cython:
2+
3+
.. currentmodule:: pandas
4+
5+
.. ipython:: python
6+
:suppress:
7+
8+
import os
9+
import csv
10+
from pandas import DataFrame
11+
import pandas as pd
12+
13+
import numpy as np
14+
np.random.seed(123456)
15+
randn = np.random.randn
16+
randint = np.random.randint
17+
np.set_printoptions(precision=4, suppress=True)
18+
19+
20+
****************************************
21+
Cython (Writing C extensions for pandas)
22+
****************************************
23+
24+
For many use cases writing pandas in pure python and numpy is sufficient. In some computationally heavy applications however, it can be possible to achieve sizeable speed-ups by offloading work to `cython <http://cython.org/>`_.
25+
26+
- Say something about this being tutorial for "advanced" users?
27+
28+
.. note::
29+
30+
The first thing to do here is to see if we can refactor in python, removing for loops (TODO add some waffle, and maybe trivial example, maybe even just using a for loop rather than apply in this example) a way which could make use of numpy...
31+
32+
33+
This tutorial walksthrough a "typical" process of cythonizing a slow computation, we use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_ in the context of pandas:
34+
35+
We have a function, ``integrate_f``, which we want to apply row-wise across a DataFrame, ``df``:
36+
37+
.. ipython:: python
38+
39+
df = DataFrame({'x': 'x', 'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000))})
40+
df
41+
42+
.. ipython:: python
43+
44+
def f(x):
45+
return x * (x - 1)
46+
47+
def integrate_f(a, b, N):
48+
s = 0
49+
dx = (b - a) / N
50+
for i in range(N):
51+
s += f(a + i * dx)
52+
return s * dx
53+
54+
In pure pandas we might achieve this using a row-wise ``apply``:
55+
56+
.. ipython:: python
57+
58+
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
59+
60+
Clearly this isn't fast enough for us, so let's take a look and see where the time is spent performing this operation (limited to the most time consuming four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_:
61+
62+
.. ipython:: python
63+
64+
%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
65+
66+
By far the majority of time is spend inside either ``integrate_f`` or ``f``, hence we concentrate our efforts cythonizing these two functions.
67+
68+
.. note::
69+
70+
In python 2 replacing the ``range`` with its generator counterpart (``xrange``) would mean the ``range`` line would vanish. In python 3 range is already a generator.
71+
72+
First, let's simply just copy our function over to cython as is (here the ``_plain`` suffix stands for "plain cython", allowing us to distinguish between our cython functions):
73+
74+
.. ipython:: python
75+
76+
%load_ext cythonmagic
77+
78+
.. ipython::
79+
80+
In [2]: %%cython
81+
...: def f_plain(x):
82+
...: return x * (x - 1)
83+
...: def integrate_f_plain(a, b, N):
84+
...: s = 0
85+
...: dx = (b - a) / N
86+
...: for i in range(N):
87+
...: s += f_plain(a + i * dx)
88+
...: return s * dx
89+
...:
90+
91+
.. ipython:: python
92+
93+
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
94+
95+
96+
We're already shaved a third off, not too bad for a simple copy and paste. We'll get another huge improvement simply by providing type information:
97+
98+
.. ipython::
99+
100+
In [3]: %%cython
101+
...: cdef double f_typed(double x) except? -2:
102+
...: return x * (x - 1)
103+
...: cpdef double integrate_f_typed(double a, double b, int N):
104+
...: cdef int i
105+
...: cdef double s, dx
106+
...: s = 0
107+
...: dx = (b - a) / N
108+
...: for i in range(N):
109+
...: s += f_typed(a + i * dx)
110+
...: return s * dx
111+
...:
112+
113+
.. ipython:: python
114+
115+
%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
116+
117+
Now, we're talking! Already we're over ten times faster than the original python version, and we haven't *really* modified the code. Let's go back and have another look at what's eating up time now:
118+
119+
.. ipython:: python
120+
121+
%prun -l 4 df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
122+
123+
It's calling series and frames... a lot, in fact they're getting called for every row in the DataFrame. Function calls are expensive in python, so maybe we should cythonize the apply part and see if we can minimise these.
124+
125+
We are now passing ndarrays into the cython function, fortunately cython plays very nicely with numpy. TODO mention the ``Py_ssize_t``.
126+
127+
.. ipython::
128+
129+
In [4]: %%cython
130+
...: cimport numpy as np
131+
...: import numpy as np
132+
...: cdef double f_typed(double x) except? -2:
133+
...: return x**2-x
134+
...: cpdef double integrate_f_typed(double a, double b, int N):
135+
...: cdef int i
136+
...: cdef double s, dx
137+
...: s = 0
138+
...: dx = (b-a)/N
139+
...: for i in range(N):
140+
...: s += f_typed(a+i*dx)
141+
...: return s * dx
142+
...: cpdef np.ndarray[double] apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
143+
...: assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
144+
...: cdef Py_ssize_t i, n = len(col_N)
145+
...: assert (len(col_a) == len(col_b) == n)
146+
...: cdef np.ndarray[double] res = np.empty(n)
147+
...: for i in range(len(col_a)):
148+
...: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
149+
...: return res
150+
...:
151+
152+
153+
We create an array of zeros and loop over the rows, applying our ``integrate_f_typed`` function to fill it up. It's worth mentioning here that although a loop like this would be extremely slow in python (TODO: "as we saw" considerably slower than the apply?) while looping over a numpy array in cython is *fast*.
154+
155+
.. ipython:: python
156+
157+
%timeit apply_integrate_f(df['a'], df['b'], df['N'])
158+
159+
We've gone another three times faster! Let's check again where the time is spent:
160+
161+
.. ipython:: python
162+
163+
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
164+
165+
As on might expect, the majority of the time is now spent in ``apply_integrate_f``, so if we wanted to make anymore efficiencies we must continue to concentrate our efforts here...
166+
167+
TODO explain decorators, and why they make it so fast!
168+
169+
.. ipython::
170+
171+
In [5]: %%cython
172+
...: cimport cython
173+
...: cimport numpy as np
174+
...: import numpy as np
175+
...: cdef double f_typed(double x) except? -2:
176+
...: return x**2-x
177+
...: cpdef double integrate_f_typed(double a, double b, int N):
178+
...: cdef int i
179+
...: cdef double s, dx
180+
...: s = 0
181+
...: dx = (b-a)/N
182+
...: for i in range(N):
183+
...: s += f_typed(a+i*dx)
184+
...: return s * dx
185+
...: @cython.boundscheck(False)
186+
...: @cython.wraparound(False)
187+
...: cpdef np.ndarray[double] apply_integrate_f_wrap(np.ndarray[double] col_a, np.ndarray[double] col_b, np.ndarray[Py_ssize_t] col_N):
188+
...: cdef Py_ssize_t i, n = len(col_N)
189+
...: assert len(col_a) == len(col_b) == n
190+
...: cdef np.ndarray[double] res = np.empty(n)
191+
...: for i in range(n):
192+
...: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
193+
...: return res
194+
...:
195+
196+
.. ipython:: python
197+
198+
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
199+
200+
Again we've shaved another third off, so let's have a look at where the time is spent:
201+
202+
.. ipython:: python
203+
204+
%prun -l 4 apply_integrate_f_wrap(df['a'], df['b'], df['N'])
205+
206+
We can see that now all the time appears to be spent in ``apply_integrate_f_wrap`` and not much anywhere else. It would make sense to continue looking here for efficiencies...
207+
208+
TODO more? Have a 2D ndarray example?
209+
210+
Using cython has made our calculation around 100 times faster than the original python only version, and yet we're left with something which doesn't look too dissimilar.
211+
212+
TODO some warning that you don't need to cythonize every function (!)
213+
214+
Further topics:
215+
216+
- One can also load in functions from other C modules you've already written.
217+
- More??
218+
219+
Read more in the `cython docs <http://docs.cython.org/>`_.

doc/sphinxext/ipython_directive.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -296,11 +296,14 @@ def process_input(self, data, input_prompt, lineno):
296296
is_savefig = decorator is not None and \
297297
decorator.startswith('@savefig')
298298

299-
input_lines = input.split('\n')
299+
def _remove_first_space_if_any(line):
300+
return line[1:] if line.startswith(' ') else line
301+
302+
input_lines = map(_remove_first_space_if_any, input.split('\n'))
300303

301304
self.datacontent = data
302305

303-
continuation = ' %s:'%''.join(['.']*(len(str(lineno))+2))
306+
continuation = ' %s: '%''.join(['.']*(len(str(lineno))+2))
304307

305308
if is_savefig:
306309
image_file, image_directive = self.process_image(decorator)

0 commit comments

Comments
 (0)