diff --git a/lectures/jax_intro.md b/lectures/jax_intro.md index 1c0d817e..c9b5f4c3 100644 --- a/lectures/jax_intro.md +++ b/lectures/jax_intro.md @@ -18,7 +18,7 @@ In addition to what's in Anaconda, this lecture will need the following librarie ```{code-cell} ipython3 :tags: [hide-output] -!pip install jax +!pip install jax quantecon ``` This lecture provides a short introduction to [Google JAX](https://github.com/google/jax). @@ -52,6 +52,7 @@ The following import is standard, replacing `import numpy as np`: ```{code-cell} ipython3 import jax import jax.numpy as jnp +import quantecon as qe ``` Now we can use `jnp` in place of `np` for the usual array operations: @@ -304,7 +305,8 @@ x = jnp.ones(n) How long does the function take to execute? ```{code-cell} ipython3 -%time f(x).block_until_ready() +with qe.Timer(): + f(x).block_until_ready() ``` ```{note} @@ -318,7 +320,8 @@ allows the Python interpreter to run ahead of numerical computations. If we run it a second time it becomes faster again: ```{code-cell} ipython3 -%time f(x).block_until_ready() +with qe.Timer(): + f(x).block_until_ready() ``` This is because the built in functions like `jnp.cos` are JIT compiled and the @@ -341,7 +344,8 @@ y = jnp.ones(m) ``` ```{code-cell} ipython3 -%time f(y).block_until_ready() +with qe.Timer(): + f(y).block_until_ready() ``` Notice that the execution time increases, because now new versions of @@ -352,14 +356,16 @@ If we run again, the code is dispatched to the correct compiled version and we get faster execution. ```{code-cell} ipython3 -%time f(y).block_until_ready() +with qe.Timer(): + f(y).block_until_ready() ``` The compiled versions for the previous array size are still available in memory too, and the following call is dispatched to the correct compiled code. ```{code-cell} ipython3 -%time f(x).block_until_ready() +with qe.Timer(): + f(x).block_until_ready() ``` ### Compiling the outer function @@ -379,7 +385,8 @@ f_jit(x) And now let's time it. ```{code-cell} ipython3 -%time f_jit(x).block_until_ready() +with qe.Timer(): + f_jit(x).block_until_ready() ``` Note the speed gain. @@ -534,10 +541,10 @@ z_loops = np.empty((n, n)) ``` ```{code-cell} ipython3 -%%time -for i in range(n): - for j in range(n): - z_loops[i, j] = f(x[i], y[j]) +with qe.Timer(): + for i in range(n): + for j in range(n): + z_loops[i, j] = f(x[i], y[j]) ``` Even for this very small grid, the run time is extremely slow. @@ -575,15 +582,15 @@ x_mesh, y_mesh = jnp.meshgrid(x, y) Now we get what we want and the execution time is very fast. ```{code-cell} ipython3 -%%time -z_mesh = f(x_mesh, y_mesh).block_until_ready() +with qe.Timer(): + z_mesh = f(x_mesh, y_mesh).block_until_ready() ``` Let's run again to eliminate compile time. ```{code-cell} ipython3 -%%time -z_mesh = f(x_mesh, y_mesh).block_until_ready() +with qe.Timer(): + z_mesh = f(x_mesh, y_mesh).block_until_ready() ``` Let's confirm that we got the right answer. @@ -602,8 +609,8 @@ x_mesh, y_mesh = jnp.meshgrid(x, y) ``` ```{code-cell} ipython3 -%%time -z_mesh = f(x_mesh, y_mesh).block_until_ready() +with qe.Timer(): + z_mesh = f(x_mesh, y_mesh).block_until_ready() ``` But there is one problem here: the mesh grids use a lot of memory. @@ -641,8 +648,8 @@ f_vec = jax.vmap(f_vec_y, in_axes=(0, None)) With this construction, we can now call the function $f$ on flat (low memory) arrays. ```{code-cell} ipython3 -%%time -z_vmap = f_vec(x, y).block_until_ready() +with qe.Timer(): + z_vmap = f_vec(x, y).block_until_ready() ``` The execution time is essentially the same as the mesh operation but we are using much less memory. @@ -711,15 +718,15 @@ def compute_call_price_jax(β=β, Let's run it once to compile it: ```{code-cell} ipython3 -%%time -compute_call_price_jax().block_until_ready() +with qe.Timer(): + compute_call_price_jax().block_until_ready() ``` And now let's time it: ```{code-cell} ipython3 -%%time -compute_call_price_jax().block_until_ready() +with qe.Timer(): + compute_call_price_jax().block_until_ready() ``` ```{solution-end} diff --git a/lectures/numba.md b/lectures/numba.md index 9580196f..f110febe 100644 --- a/lectures/numba.md +++ b/lectures/numba.md @@ -41,6 +41,8 @@ import quantecon as qe import matplotlib.pyplot as plt ``` + + ## Overview In an {doc}`earlier lecture ` we learned about vectorization, which is one method to improve speed and efficiency in numerical work. @@ -133,17 +135,17 @@ Let's time and compare identical function calls across these two versions, start ```{code-cell} ipython3 n = 10_000_000 -qe.tic() -qm(0.1, int(n)) -time1 = qe.toc() +with qe.Timer() as timer1: + qm(0.1, int(n)) +time1 = timer1.elapsed ``` Now let's try qm_numba ```{code-cell} ipython3 -qe.tic() -qm_numba(0.1, int(n)) -time2 = qe.toc() +with qe.Timer() as timer2: + qm_numba(0.1, int(n)) +time2 = timer2.elapsed ``` This is already a very large speed gain. @@ -153,9 +155,9 @@ In fact, the next time and all subsequent times it runs even faster as the funct (qm_numba_result)= ```{code-cell} ipython3 -qe.tic() -qm_numba(0.1, int(n)) -time3 = qe.toc() +with qe.Timer() as timer3: + qm_numba(0.1, int(n)) +time3 = timer3.elapsed ``` ```{code-cell} ipython3 @@ -225,15 +227,13 @@ This is equivalent to adding `qm = jit(qm)` after the function definition. The following now uses the jitted version: ```{code-cell} ipython3 -%%time - -qm(0.1, 100_000) +with qe.Timer(): + qm(0.1, 100_000) ``` ```{code-cell} ipython3 -%%time - -qm(0.1, 100_000) +with qe.Timer(): + qm(0.1, 100_000) ``` Numba also provides several arguments for decorators to accelerate computation and cache functions -- see [here](https://numba.readthedocs.io/en/stable/user/performance-tips.html). @@ -289,7 +289,8 @@ We can fix this error easily in this case by compiling `mean`. def mean(data): return np.mean(data) -%time bootstrap(data, mean, n_resamples) +with qe.Timer(): + bootstrap(data, mean, n_resamples) ``` ## Compiling Classes @@ -534,11 +535,13 @@ def calculate_pi(n=1_000_000): Now let's see how fast it runs: ```{code-cell} ipython3 -%time calculate_pi() +with qe.Timer(): + calculate_pi() ``` ```{code-cell} ipython3 -%time calculate_pi() +with qe.Timer(): + calculate_pi() ``` If we switch off JIT compilation by removing `@njit`, the code takes around @@ -639,9 +642,8 @@ This is (approximately) the right output. Now let's time it: ```{code-cell} ipython3 -qe.tic() -compute_series(n) -qe.toc() +with qe.Timer(): + compute_series(n) ``` Next let's implement a Numba version, which is easy @@ -660,9 +662,8 @@ print(np.mean(x == 0)) Let's see the time ```{code-cell} ipython3 -qe.tic() -compute_series_numba(n) -qe.toc() +with qe.Timer(): + compute_series_numba(n) ``` This is a nice speed improvement for one line of code! diff --git a/lectures/numpy.md b/lectures/numpy.md index 3efbf9fe..b030f589 100644 --- a/lectures/numpy.md +++ b/lectures/numpy.md @@ -63,6 +63,8 @@ from mpl_toolkits.mplot3d.axes3d import Axes3D from matplotlib import cm ``` + + (numpy_array)= ## NumPy Arrays @@ -1190,21 +1192,19 @@ n = 1_000_000 ``` ```{code-cell} python3 -%%time - -y = 0 # Will accumulate and store sum -for i in range(n): - x = random.uniform(0, 1) - y += x**2 +with qe.Timer(): + y = 0 # Will accumulate and store sum + for i in range(n): + x = random.uniform(0, 1) + y += x**2 ``` The following vectorized code achieves the same thing. ```{code-cell} ipython -%%time - -x = np.random.uniform(0, 1, n) -y = np.sum(x**2) +with qe.Timer(): + x = np.random.uniform(0, 1, n) + y = np.sum(x**2) ``` As you can see, the second code block runs much faster. Why? @@ -1285,24 +1285,22 @@ grid = np.linspace(-3, 3, 1000) Here's a non-vectorized version that uses Python loops. ```{code-cell} python3 -%%time - -m = -np.inf +with qe.Timer(): + m = -np.inf -for x in grid: - for y in grid: - z = f(x, y) - if z > m: - m = z + for x in grid: + for y in grid: + z = f(x, y) + if z > m: + m = z ``` And here's a vectorized version ```{code-cell} python3 -%%time - -x, y = np.meshgrid(grid, grid) -np.max(f(x, y)) +with qe.Timer(): + x, y = np.meshgrid(grid, grid) + np.max(f(x, y)) ``` In the vectorized version, all the looping takes place in compiled code. @@ -1636,9 +1634,8 @@ np.random.seed(123) x = np.random.randn(1000, 100, 100) y = np.random.randn(100) -qe.tic() -B = x / y -qe.toc() +with qe.Timer("Broadcasting operation"): + B = x / y ``` Here is the output @@ -1696,14 +1693,13 @@ np.random.seed(123) x = np.random.randn(1000, 100, 100) y = np.random.randn(100) -qe.tic() -D = np.empty_like(x) -d1, d2, d3 = x.shape -for i in range(d1): - for j in range(d2): - for k in range(d3): - D[i, j, k] = x[i, j, k] / y[k] -qe.toc() +with qe.Timer("For loop operation"): + D = np.empty_like(x) + d1, d2, d3 = x.shape + for i in range(d1): + for j in range(d2): + for k in range(d3): + D[i, j, k] = x[i, j, k] / y[k] ``` Note that the `for` loop takes much longer than the broadcasting operation. diff --git a/lectures/parallelization.md b/lectures/parallelization.md index 41f23835..34dc55b5 100644 --- a/lectures/parallelization.md +++ b/lectures/parallelization.md @@ -364,8 +364,8 @@ def compute_long_run_median(w0=1, T=1000, num_reps=50_000): Let's see how fast this runs: ```{code-cell} ipython -%%time -compute_long_run_median() +with qe.Timer(): + compute_long_run_median() ``` To speed this up, we're going to parallelize it via multithreading. @@ -391,8 +391,8 @@ def compute_long_run_median_parallel(w0=1, T=1000, num_reps=50_000): Let's look at the timing: ```{code-cell} ipython -%%time -compute_long_run_median_parallel() +with qe.Timer(): + compute_long_run_median_parallel() ``` The speed-up is significant. @@ -461,11 +461,13 @@ def calculate_pi(n=1_000_000): Now let's see how fast it runs: ```{code-cell} ipython3 -%time calculate_pi() +with qe.Timer(): + calculate_pi() ``` ```{code-cell} ipython3 -%time calculate_pi() +with qe.Timer(): + calculate_pi() ``` By switching parallelization on and off (selecting `True` or