Skip to content
This repository was archived by the owner on Apr 24, 2020. It is now read-only.

Commit 14e325e

Browse files
authored
Add parallelization lecture (#719)
* getting started * added more discussion * misc * final edits from js on new sci lectures * added fig
1 parent 7dd1bbc commit 14e325e

File tree

6 files changed

+726
-694
lines changed

6 files changed

+726
-694
lines changed
Loading

source/rst/index_python_scientific_libraries.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ Next we cover the third party libraries most useful for scientific work in Pytho
2323
matplotlib
2424
scipy
2525
numba
26-
sci_libs
26+
parallelization

source/rst/need_for_speed.rst

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,14 @@ routines we want to use.
6161
For example, it's almost always better to use an existing routine for root
6262
finding than to write a new one from scratch.
6363

64+
(For standard algorithms, efficiency is maximized if the community can coordinate on a
65+
common set of implementations, written by experts and tuned by users to be as fast and robust as possible.)
66+
6467
But this is not the only reason that we use Python's scientific libraries.
6568

6669
Another is that pure Python, while flexible and elegant, is not fast.
6770

68-
So we need libraries that help us accelerate our Python code.
71+
So we need libraries that are designed to accelerate execution of Python code.
6972

7073
As we'll see below, there are now Python libraries that can do this extremely well.
7174

@@ -131,17 +134,19 @@ Indeed, the standard implementation of Python (called CPython) cannot match the
131134

132135
Does that mean that we should just switch to C or Fortran for everything?
133136

134-
The answer is: no, no and one hundred times no!
137+
The answer is: No, no and one hundred times no!
138+
139+
(This is what you should say to the senior professor insisting that the model
140+
needs to be rewritten in Fortran or C++.)
135141

136142
There are two reasons why:
137143

138144
First, for any given program, relatively few lines are ever going to
139145
be time-critical.
140146

141-
Hence we should write most of our code in a high productivity language like
142-
Python.
147+
Hence it is far more efficient to write most of our code in a high productivity language like Python.
143148

144-
Second, for those lines of code that *are* time-critical, we can now achieve the same speed as C or Fortran using Python's scientific libraries.
149+
Second, even for those lines of code that *are* time-critical, we can now achieve the same speed as C or Fortran using Python's scientific libraries.
145150

146151

147152
Where are the Bottlenecks?
@@ -150,6 +155,8 @@ Where are the Bottlenecks?
150155
Before we learn how to do this, let's try to understand why plain vanila
151156
Python is slower than C or Fortran.
152157

158+
This will, in turn, help us figure out how to speed things up.
159+
153160

154161
Dynamic Typing
155162
^^^^^^^^^^^^^^
@@ -281,16 +288,19 @@ Let's look at some ways around these problems.
281288
There is a clever method called **vectorization** that can be
282289
used to speed up high level languages in numerical applications.
283290

284-
The key idea is to send array processing operations in batch to precompiled
291+
The key idea is to send array processing operations in batch to pre-compiled
285292
and efficient native machine code.
286293

287294
The machine code itself is typically compiled from carefully optimized C or Fortran.
288295

296+
For example, when working in a high level language, the operation of inverting a large matrix can be subcontracted to efficient machine code that is pre-compiled for this purpose and supplied to users as part of a package.
297+
289298
This clever idea dates back to MATLAB, which uses vectorization extensively.
290299

291-
Vectorization can greatly accelerate many (but not all) numerical computations.
300+
Vectorization can greatly accelerate many numerical computations (but not all,
301+
as we shall see).
292302

293-
Let's see how it works in Python, using NumPy.
303+
Let's see how vectorization works in Python, using NumPy.
294304

295305

296306
Operations on Arrays
@@ -471,13 +481,15 @@ In the vectorized version, all the looping takes place in compiled code.
471481

472482
As you can see, the second version is **much** faster.
473483

474-
(We'll make it even faster again below when we discuss Numba)
484+
(We'll make it even faster again later on, using more scientific programming tricks.)
485+
475486

476487

477488
.. _numba-p_c_vectorization:
478489

479-
Pros and Cons of Vectorization
480-
------------------------------
490+
Beyond Vectorization
491+
====================
492+
481493

482494
At its best, vectorization yields fast, simple code.
483495

@@ -488,17 +500,17 @@ One issue is that it can be highly memory-intensive.
488500
For example, the vectorized maximization routine above is far more memory
489501
intensive than the non-vectorized version that preceded it.
490502

503+
This is because vectorization tends to create many intermediate arrays before
504+
producing the final calculation.
505+
491506
Another issue is that not all algorithms can be vectorized.
492507

493508
In these kinds of settings, we need to go back to loops.
494509

495-
Fortunately, there are nice ways to speed up Python loops.
496-
497-
498-
Beyond Vectorization
499-
====================
510+
Fortunately, there are alternative ways to speed up Python loops that work in
511+
almost any setting.
500512

501-
In the last few years, a new Python library called `Numba
513+
For example, in the last few years, a new Python library called `Numba
502514
<http://numba.pydata.org/>`__ has appeared that solves the main problems
503515
with vectorization listed above.
504516

0 commit comments

Comments
 (0)