Skip to content

Commit 29b51fa

Browse files
authored
Merge pull request #197 from QuantEcon/review_eigen_markov
Minor updates
2 parents b88ca4d + 1a91b67 commit 29b51fa

File tree

4 files changed

+53
-56
lines changed

4 files changed

+53
-56
lines changed

lectures/eigen_I.md

Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ kernelspec:
2525

2626
## Overview
2727

28-
Eigenvalues and eigenvectors are a somewhat advanced topic in linear and
28+
Eigenvalues and eigenvectors are an advanced topic in linear and
2929
matrix algebra.
3030

3131
At the same time, these concepts are extremely useful for
@@ -36,14 +36,7 @@ At the same time, these concepts are extremely useful for
3636
* machine learning
3737
* and many other fields of science.
3838

39-
In this lecture we explain the basics of eigenvalues and eigenvectors, and
40-
state two very important results from linear algebra.
41-
42-
The first is called the Neumann series theorem and the second is called the
43-
Perron-Frobenius theorem.
44-
45-
We will explain what these theorems tell us and how we can use them to
46-
understand the predictions of economic models.
39+
In this lecture we explain the basics of eigenvalues and eigenvectors.
4740

4841
We assume in this lecture that students are familiar with matrices and
4942
understand the basics of matrix algebra.
@@ -89,10 +82,10 @@ a map transforming $x$ into $Ax$.
8982

9083
Because $A$ is $n \times m$, it transforms $m$-vectors into $n$-vectors.
9184

92-
We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$
85+
We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$.
9386

94-
(You might argue that if $A$ is a function then we should write
95-
$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.)
87+
You might argue that if $A$ is a function then we should write
88+
$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.
9689

9790
### Square matrices
9891

@@ -101,7 +94,7 @@ Let's restrict our discussion to square matrices.
10194
In the above discussion, this means that $m=n$ and $A$ maps $\mathbb R^n$ into
10295
itself.
10396

104-
To repeat, $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
97+
This means $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
10598
$x$ in $\mathbb{R}^n$ into a new vector $y=Ax$ also in $\mathbb{R}^n$.
10699

107100
Here's one example:
@@ -183,8 +176,8 @@ plt.show()
183176

184177
One way to understand this transformation is that $A$
185178

186-
* first rotates $x$ by some angle $\theta$
187-
* and then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.
179+
* first rotates $x$ by some angle $\theta$ and
180+
* then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.
188181

189182

190183

@@ -198,7 +191,7 @@ instead of arrows.
198191
We consider how a given matrix transforms
199192

200193
* a grid of points and
201-
* a set of points located on the unit circle in $\mathbb{R}^2$
194+
* a set of points located on the unit circle in $\mathbb{R}^2$.
202195

203196
To build the transformations we will use two functions, called `grid_transform` and `circle_transform`.
204197

@@ -498,10 +491,8 @@ same as first applying $B$ on $x$ and then applying $A$ on the vector $Bx$.
498491

499492
Thus the matrix product $AB$ is the
500493
[composition](https://en.wikipedia.org/wiki/Function_composition) of the
501-
matrix transformations $A$ and $B$.
502-
503-
(To compose the transformations, first apply transformation $B$ and then
504-
transformation $A$.)
494+
matrix transformations $A$ and $B$, which repersents first apply transformation $B$ and then
495+
transformation $A$.
505496

506497
When we matrix multiply an $n \times m$ matrix $A$ with an $m \times k$ matrix
507498
$B$ the obtained matrix product is an $n \times k$ matrix $AB$.
@@ -590,11 +581,11 @@ grid_composition_transform(B,A) #transformation BA
590581

591582
+++ {"user_expressions": []}
592583

593-
It is quite evident that the transformation $AB$ is not the same as the transformation $BA$.
584+
It is evident that the transformation $AB$ is not the same as the transformation $BA$.
594585

595586
## Iterating on a fixed map
596587

597-
In economics (and especially in dynamic modeling), we often are interested in
588+
In economics (and especially in dynamic modeling), we are often interested in
598589
analyzing behavior where we repeatedly apply a fixed matrix.
599590

600591
For example, given a vector $v$ and a matrix $A$, we are interested in
@@ -603,7 +594,7 @@ studying the sequence
603594
$$
604595
v, \quad
605596
Av, \quad
606-
AAv = A^2v, \ldots
597+
AAv = A^2v, \quad \ldots
607598
$$
608599

609600
Let's first see examples of a sequence of iterates $(A^k v)_{k \geq 0}$ under
@@ -721,13 +712,14 @@ In this section we introduce the notions of eigenvalues and eigenvectors.
721712

722713
Let $A$ be an $n \times n$ square matrix.
723714

724-
If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that
715+
If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that
725716

726717
$$
727-
A v = \lambda v
718+
A v = \lambda v.
728719
$$
729720

730-
then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is an *eigenvector*.
721+
722+
Then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is the corresponding *eigenvector*.
731723

732724
Thus, an eigenvector of $A$ is a nonzero vector $v$ such that when the map $A$ is
733725
applied, $v$ is merely scaled.
@@ -792,7 +784,7 @@ plt.show()
792784

793785
So far our definition of eigenvalues and eigenvectors seems straightforward.
794786

795-
There is, however, one complication we haven't mentioned yet:
787+
There is one complication we haven't mentioned yet:
796788

797789
When solving $Av = \lambda v$,
798790

@@ -812,7 +804,7 @@ The eigenvalue equation is equivalent to $(A - \lambda I) v = 0$.
812804

813805
This equation has a nonzero solution $v$ only when the columns of $A - \lambda I$ are linearly dependent.
814806

815-
This in turn is equivalent to stating that the determinant is zero.
807+
This in turn is equivalent to stating the determinant is zero.
816808

817809
Hence, to find all eigenvalues, we can look for $\lambda$ such that the
818810
determinant of $A - \lambda I$ is zero.
@@ -860,7 +852,7 @@ evecs #eigenvectors
860852
Note that the *columns* of `evecs` are the eigenvectors.
861853

862854
Since any scalar multiple of an eigenvector is an eigenvector with the same
863-
eigenvalue (check it), the eig routine normalizes the length of each eigenvector
855+
eigenvalue (which can be verified), the `eig` routine normalizes the length of each eigenvector
864856
to one.
865857

866858
The eigenvectors and eigenvalues of a map $A$ determine how a vector $v$ is transformed when we repeatedly multiply by $A$.
@@ -882,9 +874,9 @@ $$
882874
883875
A thorough discussion of the method can be found [here](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter15.02-The-Power-Method.html).
884876
885-
In this exercise, implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.
877+
In this exercise, first implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.
886878
887-
Visualize the convergence.
879+
Then visualize the convergence.
888880
```
889881

890882
```{solution-start} eig1_ex1

lectures/eigen_II.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Often, in economics, the matrix that we are dealing with is nonnegative.
5050

5151
Nonnegative matrices have several special and useful properties.
5252

53-
In this section we discuss some of them --- in particular, the connection
53+
In this section we will discuss some of them --- in particular, the connection
5454
between nonnegativity and eigenvalues.
5555

5656
Let $a^{k}_{ij}$ be element $(i,j)$ of $A^k$.
@@ -63,7 +63,7 @@ We denote this as $A \geq 0$.
6363
(irreducible)=
6464
### Irreducible matrices
6565

66-
We have (informally) introduced irreducible matrices in the Markov chain lecture (TODO: link to Markov chain lecture).
66+
We have (informally) introduced irreducible matrices in the [Markov chain lecture](markov_chains_II.md).
6767

6868
Here we will introduce this concept formally.
6969

@@ -157,9 +157,8 @@ This is a more common expression and where the name left eigenvectors originates
157157
For a nonnegative matrix $A$ the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest
158158
absolute value, often called the **dominant eigenvalue**.
159159

160-
For a matrix $A$, the Perron-Frobenius Theorem characterizes certain
161-
properties of the dominant eigenvalue and its corresponding eigenvector when
162-
$A$ is a nonnegative square matrix.
160+
For a matrix nonnegative square matrix $A$, the Perron-Frobenius Theorem characterizes certain
161+
properties of the dominant eigenvalue and its corresponding eigenvector.
163162

164163
```{prf:Theorem} Perron-Frobenius Theorem
165164
:label: perron-frobenius
@@ -179,7 +178,9 @@ If $A$ is primitive then,
179178
180179
6. the inequality $|\lambda| \leq r(A)$ is **strict** for all eigenvalues $\lambda$ of $A$ distinct from $r(A)$, and
181180
7. with $v$ and $w$ normalized so that the inner product of $w$ and $v = 1$, we have
182-
$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. $v w^{\top}$ is called the **Perron projection** of $A$.
181+
$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$.
182+
\
183+
the matrix $v w^{\top}$ is called the **Perron projection** of $A$.
183184
```
184185

185186
(This is a relatively simple version of the theorem --- for more details see
@@ -299,7 +300,7 @@ def check_convergence(M):
299300
300301
# Calculate the norm of the difference matrix
301302
diff_norm = np.linalg.norm(diff, 'fro')
302-
print(f"n = {n}, norm of the difference: {diff_norm:.10f}")
303+
print(f"n = {n}, error = {diff_norm:.10f}")
303304
304305
305306
A1 = np.array([[1, 2],
@@ -394,6 +395,8 @@ In the {ref}`exercise<mc1_ex_1>`, we stated that the convergence rate is determi
394395

395396
This can be proven using what we have learned here.
396397

398+
Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.
399+
397400
With Markov model $M$ with state space $S$ and transition matrix $P$, we can write $P^t$ as
398401

399402
$$
@@ -402,7 +405,7 @@ $$
402405

403406
This is proven in {cite}`sargent2023economic` and a nice discussion can be found [here](https://math.stackexchange.com/questions/2433997/can-all-matrices-be-decomposed-as-product-of-right-and-left-eigenvector).
404407

405-
In the formula $\lambda_i$ is an eigenvalue of $P$ and $v_i$ and $w_i$ are the right and left eigenvectors corresponding to $\lambda_i$.
408+
In this formula $\lambda_i$ is an eigenvalue of $P$ with corresponding right and left eigenvectors $v_i$ and $w_i$ .
406409

407410
Premultiplying $P^t$ by arbitrary $\psi \in \mathscr{D}(S)$ and rearranging now gives
408411

@@ -485,7 +488,7 @@ The following is a fundamental result in functional analysis that generalizes
485488
486489
Let $A$ be a square matrix and let $A^k$ be the $k$-th power of $A$.
487490
488-
Let $r(A)$ be the dominant eigenvector or as it is commonly called the *spectral radius*, defined as $\max_i |\lambda_i|$, where
491+
Let $r(A)$ be the **spectral radius** of $A$, defined as $\max_i |\lambda_i|$, where
489492
490493
* $\{\lambda_i\}_i$ is the set of eigenvalues of $A$ and
491494
* $|\lambda_i|$ is the modulus of the complex number $\lambda_i$

lectures/markov_chains_I.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ In other words,
9898

9999
If $P$ is a stochastic matrix, then so is the $k$-th power $P^k$ for all $k \in \mathbb N$.
100100

101-
Checking this is {ref}`one of the exercises <mc1_ex_3>` below.
101+
Checking this in {ref}`the first exercises <mc1_ex_3>` below.
102102

103103

104104
### Markov chains
@@ -255,11 +255,11 @@ We'll cover some of these applications below.
255255
(mc_eg3)=
256256
#### Example 3
257257

258-
Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy (D), autocracy (A), and an intermediate state called anocracy (N).
258+
Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy $\text{(D)}$, autocracy $\text{(A)}$, and an intermediate state called anocracy $\text{(N)}$.
259259

260-
Each institution can have two potential development regimes: collapse (C) and growth (G). This results in six possible states: DG, DC, NG, NC, AG, and AC.
260+
Each institution can have two potential development regimes: collapse $\text{(C)}$ and growth $\text{(G)}$. This results in six possible states: $\text{DG, DC, NG, NC, AG}$ and $\text{AC}$.
261261

262-
The lower probability of transitioning from NC to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.
262+
The lower probability of transitioning from $\text{NC}$ to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.
263263

264264
Democracies tend to have longer-lasting growth regimes compared to autocracies as indicated by the lower probability of transitioning from growth to growth in autocracies.
265265

@@ -393,7 +393,7 @@ In these exercises, we'll take the state space to be $S = 0,\ldots, n-1$.
393393
To simulate a Markov chain, we need
394394

395395
1. a stochastic matrix $P$ and
396-
1. a probability mass function $\psi_0$ of length $n$ from which to draw a initial realization of $X_0$.
396+
1. a probability mass function $\psi_0$ of length $n$ from which to draw an initial realization of $X_0$.
397397

398398
The Markov chain is then constructed as follows:
399399

@@ -405,7 +405,7 @@ The Markov chain is then constructed as follows:
405405
To implement this simulation procedure, we need a method for generating draws
406406
from a discrete distribution.
407407

408-
For this task, we'll use `random.draw` from [QuantEcon](http://quantecon.org/quantecon-py).
408+
For this task, we'll use `random.draw` from [QuantEcon.py](http://quantecon.org/quantecon-py).
409409

410410
To use `random.draw`, we first need to convert the probability mass function
411411
to a cumulative distribution
@@ -491,7 +491,7 @@ always close to 0.25 (for the `P` matrix above).
491491

492492
### Using QuantEcon's routines
493493

494-
[QuantEcon.py](http://quantecon.org/quantecon-py) has routines for handling Markov chains, including simulation.
494+
QuantEcon.py has routines for handling Markov chains, including simulation.
495495

496496
Here's an illustration using the same $P$ as the preceding example
497497

@@ -585,15 +585,15 @@ $$
585585

586586
There are $n$ such equations, one for each $y \in S$.
587587

588-
If we think of $\psi_{t+1}$ and $\psi_t$ as *row vectors*, these $n$ equations are summarized by the matrix expression
588+
If we think of $\psi_{t+1}$ and $\psi_t$ as row vectors, these $n$ equations are summarized by the matrix expression
589589

590590
```{math}
591591
:label: fin_mc_fr
592592
593593
\psi_{t+1} = \psi_t P
594594
```
595595

596-
Thus, to move a distribution forward one unit of time, we postmultiply by $P$.
596+
Thus, we postmultiply by $P$ to move a distribution forward one unit of time.
597597

598598
By postmultiplying $m$ times, we move a distribution forward $m$ steps into the future.
599599

@@ -671,7 +671,7 @@ $$
671671
The distributions we have been studying can be viewed either
672672

673673
1. as probabilities or
674-
1. as cross-sectional frequencies that a Law of Large Numbers leads us to anticipate for large samples.
674+
1. as cross-sectional frequencies that the Law of Large Numbers leads us to anticipate for large samples.
675675

676676
To illustrate, recall our model of employment/unemployment dynamics for a given worker {ref}`discussed above <mc_eg1>`.
677677

@@ -788,7 +788,7 @@ Not surprisingly it tends to zero as $\beta \to 0$, and to one as $\alpha \to 0$
788788

789789
### Calculating stationary distributions
790790

791-
A stable algorithm for computing stationary distributions is implemented in [QuantEcon.py](http://quantecon.org/quantecon-py).
791+
A stable algorithm for computing stationary distributions is implemented in QuantEcon.py.
792792

793793
Here's an example
794794

lectures/markov_chains_II.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ Theorem 5.2 of {cite}`haggstrom2002finite`.
209209
(ergodicity)=
210210
## Ergodicity
211211

212+
Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.
213+
212214
Under irreducibility, yet another important result obtains:
213215

214216
````{prf:theorem}
@@ -228,9 +230,9 @@ distribution, then, for all $x \in S$,
228230

229231
Here
230232

231-
* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial
233+
* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial.
232234
distribution $\psi_0$
233-
* $\mathbf{1}\{X_t = x\} = 1$ if $X_t = x$ and zero otherwise
235+
* $\mathbb{1} \{X_t = x\} = 1$ if $X_t = x$ and zero otherwise.
234236

235237
The result in [theorem 4.3](llnfmc0) is sometimes called **ergodicity**.
236238

@@ -242,7 +244,7 @@ This gives us another way to interpret the stationary distribution (provided irr
242244

243245
Importantly, the result is valid for any choice of $\psi_0$.
244246

245-
The theorem is related to {doc}`the law of large numbers <lln_clt>`.
247+
The theorem is related to {doc}`the Law of Large Numbers <lln_clt>`.
246248

247249
It tells us that, in some settings, the law of large numbers sometimes holds even when the
248250
sequence of random variables is [not IID](iid_violation).
@@ -394,7 +396,7 @@ Unlike other Markov chains we have seen before, it has a periodic cycle --- the
394396

395397
This is called [periodicity](https://www.randomservices.org/random/markov/Periodicity.html).
396398

397-
It is still irreducible, however, so ergodicity holds.
399+
It is still irreducible so ergodicity holds.
398400

399401
```{code-cell} ipython3
400402
P = np.array([[0, 1],
@@ -424,7 +426,7 @@ for i in range(n):
424426
plt.show()
425427
```
426428

427-
This example helps to emphasize the fact that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.
429+
This example helps to emphasize that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.
428430

429431
The proportion of time spent in a state can converge to the stationary distribution with periodic chains.
430432

0 commit comments

Comments
 (0)