Merge pull request #197 from QuantEcon/review_eigen_markov

jstac · web-flow · commit 29b51fa5d41b · 2023-05-25T15:10:47.000+10:00
Minor updates
diff --git a/lectures/eigen_I.md b/lectures/eigen_I.md
@@ -25,7 +25,7 @@ kernelspec:
 
 ## Overview
 
-Eigenvalues and eigenvectors are a somewhat advanced topic in linear and
+Eigenvalues and eigenvectors are an advanced topic in linear and
 matrix algebra.
 
 At the same time, these concepts are extremely useful for 
@@ -36,14 +36,7 @@ At the same time, these concepts are extremely useful for
 * machine learning
 * and many other fields of science.
 
-In this lecture we explain the basics of eigenvalues and eigenvectors, and
-state two very important results from linear algebra.
-
-The first is called the Neumann series theorem and the second is called the
-Perron-Frobenius theorem.
-
-We will explain what these theorems tell us and how we can use them to
-understand the predictions of economic models.
+In this lecture we explain the basics of eigenvalues and eigenvectors.
 
 We assume in this lecture that students are familiar with matrices and
 understand the basics of matrix algebra.
@@ -89,10 +82,10 @@ a map transforming $x$ into $Ax$.
 
 Because $A$ is $n \times m$, it transforms $m$-vectors into $n$-vectors.
 
-We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$ 
+We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$.
 
-(You might argue that if $A$ is a function then we should write 
-$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.)
+You might argue that if $A$ is a function then we should write 
+$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.
 
 ### Square matrices
 
@@ -101,7 +94,7 @@ Let's restrict our discussion to square matrices.
 In the above discussion, this means that $m=n$ and $A$ maps $\mathbb R^n$ into
 itself.
 
-To repeat, $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
+This means $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
 $x$ in $\mathbb{R}^n$ into a new vector $y=Ax$ also in $\mathbb{R}^n$.
 
 Here's one example:
@@ -183,8 +176,8 @@ plt.show()
 
 One way to understand this transformation is that $A$ 
 
-* first rotates $x$ by some angle $\theta$ 
-* and then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.
+* first rotates $x$ by some angle $\theta$ and
+* then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.
 
 
 
@@ -198,7 +191,7 @@ instead of arrows.
 We consider how a given matrix transforms 
 
 * a grid of points and 
-* a set of points located on the unit circle in $\mathbb{R}^2$
+* a set of points located on the unit circle in $\mathbb{R}^2$.
 
 To build the transformations we will use two functions, called `grid_transform` and `circle_transform`.
 
@@ -498,10 +491,8 @@ same as first applying $B$ on $x$ and then applying $A$ on the vector $Bx$.
 
 Thus the matrix product $AB$ is the
 [composition](https://en.wikipedia.org/wiki/Function_composition) of the
-matrix transformations $A$ and $B$.
-
-(To compose the transformations, first apply transformation $B$ and then
-transformation $A$.)
+matrix transformations $A$ and $B$, which repersents first apply transformation $B$ and then
+transformation $A$.
 
 When we matrix multiply an $n \times m$ matrix $A$ with an $m \times k$ matrix
 $B$ the obtained matrix product is an $n \times k$ matrix $AB$.
@@ -590,11 +581,11 @@ grid_composition_transform(B,A)         #transformation BA
 
 +++ {"user_expressions": []}
 
-It is quite evident that the transformation $AB$ is not the same as the transformation $BA$.
+It is evident that the transformation $AB$ is not the same as the transformation $BA$.
 
 ## Iterating on a fixed map
 
-In economics (and especially in dynamic modeling), we often are interested in
+In economics (and especially in dynamic modeling), we are often interested in
 analyzing behavior where we repeatedly apply a fixed matrix.
 
 For example, given a vector $v$ and a matrix $A$, we are interested in
@@ -603,7 +594,7 @@ studying the sequence
 $$ 
     v, \quad
     Av, \quad
-    AAv = A^2v, \ldots
+    AAv = A^2v, \quad \ldots
 $$
 
 Let's first see examples of a sequence of iterates $(A^k v)_{k \geq 0}$ under
@@ -721,13 +712,14 @@ In this section we introduce the notions of eigenvalues and eigenvectors.
 
 Let $A$ be an $n \times n$ square matrix.
 
-If $\lambda$ is scalar and $v$ is a non-zero $n$-vector  such that
+If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that
 
 $$
-A v = \lambda v
+A v = \lambda v.
 $$
 
-then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is an *eigenvector*.
+
+Then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is the corresponding *eigenvector*.
 
 Thus, an eigenvector of $A$ is a nonzero vector $v$ such that when the map $A$ is
 applied, $v$ is merely scaled.
@@ -792,7 +784,7 @@ plt.show()
 
 So far our definition of eigenvalues and eigenvectors seems straightforward.
 
-There is, however, one complication we haven't mentioned yet:
+There is one complication we haven't mentioned yet:
 
 When solving $Av = \lambda v$, 
 
@@ -812,7 +804,7 @@ The eigenvalue equation is equivalent to $(A - \lambda I) v = 0$.
 
 This equation has a nonzero solution $v$ only when the columns of $A - \lambda I$ are linearly dependent.
 
-This in turn is equivalent to stating that the determinant is zero.
+This in turn is equivalent to stating the determinant is zero.
 
 Hence, to find all eigenvalues, we can look for $\lambda$ such that the
 determinant of $A - \lambda I$ is zero.
@@ -860,7 +852,7 @@ evecs   #eigenvectors
 Note that the *columns* of `evecs` are the eigenvectors.
 
 Since any scalar multiple of an eigenvector is an eigenvector with the same
-eigenvalue (check it), the eig routine normalizes the length of each eigenvector
+eigenvalue (which can be verified), the `eig` routine normalizes the length of each eigenvector
 to one.
 
 The eigenvectors and eigenvalues of a map $A$ determine how a vector $v$ is transformed when we repeatedly multiply by $A$.
@@ -882,9 +874,9 @@ $$
 
 A thorough discussion of the method can be found [here](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter15.02-The-Power-Method.html).
 
-In this exercise, implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.
+In this exercise, first implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.
 
-Visualize the convergence.
+Then visualize the convergence.
 ```
 
 ```{solution-start} eig1_ex1
diff --git a/lectures/eigen_II.md b/lectures/eigen_II.md
@@ -50,7 +50,7 @@ Often, in economics, the matrix that we are dealing with is nonnegative.
 
 Nonnegative matrices have several special and useful properties.
 
-In this section we discuss some of them --- in particular, the connection
+In this section we will discuss some of them --- in particular, the connection
 between nonnegativity and eigenvalues.
 
 Let $a^{k}_{ij}$ be element $(i,j)$ of $A^k$.
@@ -63,7 +63,7 @@ We denote this as $A \geq 0$.
 (irreducible)=
 ### Irreducible matrices
 
-We have (informally) introduced irreducible matrices in the Markov chain lecture (TODO: link to Markov chain lecture).
+We have (informally) introduced irreducible matrices in the [Markov chain lecture](markov_chains_II.md).
 
 Here we will introduce this concept formally.
 
@@ -157,9 +157,8 @@ This is a more common expression and where the name left eigenvectors originates
 For a nonnegative matrix $A$ the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest
 absolute value, often called the **dominant eigenvalue**.
 
-For a matrix $A$, the Perron-Frobenius Theorem characterizes certain
-properties of the dominant eigenvalue and its corresponding eigenvector when
-$A$ is a nonnegative square matrix.
+For a matrix nonnegative square matrix $A$, the Perron-Frobenius Theorem characterizes certain
+properties of the dominant eigenvalue and its corresponding eigenvector.
 
 ```{prf:Theorem} Perron-Frobenius Theorem
 :label: perron-frobenius
@@ -179,7 +178,9 @@ If $A$ is primitive then,
 
 6. the inequality $|\lambda| \leq r(A)$ is **strict** for all eigenvalues $\lambda$ of $A$ distinct from $r(A)$, and
 7. with $v$ and $w$ normalized so that the inner product of $w$ and  $v = 1$, we have
-$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. $v w^{\top}$ is called the **Perron projection** of $A$.
+$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$.
+\
+the matrix $v w^{\top}$ is called the **Perron projection** of $A$.
 ```
 
 (This is a relatively simple version of the theorem --- for more details see
@@ -299,7 +300,7 @@ def check_convergence(M):
 
         # Calculate the norm of the difference matrix
         diff_norm = np.linalg.norm(diff, 'fro')
-        print(f"n = {n}, norm of the difference: {diff_norm:.10f}")
+        print(f"n = {n}, error = {diff_norm:.10f}")
 
 
 A1 = np.array([[1, 2],
@@ -394,6 +395,8 @@ In the {ref}`exercise<mc1_ex_1>`, we stated that the convergence rate is determi
 
 This can be proven using what we have learned here.
 
+Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.
+
 With Markov model $M$ with state space $S$ and transition matrix $P$, we can write $P^t$ as
 
 $$
@@ -402,7 +405,7 @@ $$
 
 This is proven in {cite}`sargent2023economic` and a nice discussion can be found [here](https://math.stackexchange.com/questions/2433997/can-all-matrices-be-decomposed-as-product-of-right-and-left-eigenvector).
 
-In the formula $\lambda_i$ is an eigenvalue of $P$ and $v_i$ and $w_i$ are the right and left eigenvectors corresponding to $\lambda_i$.
+In this formula $\lambda_i$ is an eigenvalue of $P$ with corresponding right and left eigenvectors $v_i$ and $w_i$ .
 
 Premultiplying $P^t$ by arbitrary $\psi \in \mathscr{D}(S)$ and rearranging now gives
 
@@ -485,7 +488,7 @@ The following is a fundamental result in functional analysis that generalizes
 
 Let $A$ be a square matrix and let $A^k$ be the $k$-th power of $A$.
 
-Let $r(A)$ be the dominant eigenvector or as it is commonly called the *spectral radius*, defined as $\max_i |\lambda_i|$, where
+Let $r(A)$ be the **spectral radius** of $A$, defined as $\max_i |\lambda_i|$, where
 
 * $\{\lambda_i\}_i$ is the set of eigenvalues of $A$ and
 * $|\lambda_i|$ is the modulus of the complex number $\lambda_i$
diff --git a/lectures/markov_chains_I.md b/lectures/markov_chains_I.md
@@ -98,7 +98,7 @@ In other words,
 
 If $P$ is a stochastic matrix, then so is the $k$-th power $P^k$ for all $k \in \mathbb N$.
 
-Checking this is {ref}`one of the exercises <mc1_ex_3>` below.
+Checking this in {ref}`the first exercises <mc1_ex_3>` below.
 
 
 ### Markov chains
@@ -255,11 +255,11 @@ We'll cover some of these applications below.
 (mc_eg3)=
 #### Example 3
 
-Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy (D), autocracy (A), and an intermediate state called anocracy (N).
+Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy $\text{(D)}$, autocracy $\text{(A)}$, and an intermediate state called anocracy $\text{(N)}$.
 
-Each institution can have two potential development regimes: collapse (C) and growth (G). This results in six possible states: DG, DC, NG, NC, AG, and AC.
+Each institution can have two potential development regimes: collapse $\text{(C)}$ and growth $\text{(G)}$. This results in six possible states: $\text{DG, DC, NG, NC, AG}$ and $\text{AC}$.
 
-The lower probability of transitioning from NC to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.
+The lower probability of transitioning from $\text{NC}$ to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.
 
 Democracies tend to have longer-lasting growth regimes compared to autocracies as indicated by the lower probability of transitioning from growth to growth in autocracies.
 
@@ -393,7 +393,7 @@ In these exercises, we'll take the state space to be $S = 0,\ldots, n-1$.
 To simulate a Markov chain, we need
 
 1. a stochastic matrix $P$ and
-1. a probability mass function $\psi_0$ of length $n$ from which to draw a initial realization of $X_0$.
+1. a probability mass function $\psi_0$ of length $n$ from which to draw an initial realization of $X_0$.
 
 The Markov chain is then constructed as follows:
 
@@ -405,7 +405,7 @@ The Markov chain is then constructed as follows:
 To implement this simulation procedure, we need a method for generating draws
 from a discrete distribution.
 
-For this task, we'll use `random.draw` from [QuantEcon](http://quantecon.org/quantecon-py).
+For this task, we'll use `random.draw` from [QuantEcon.py](http://quantecon.org/quantecon-py).
 
 To use `random.draw`, we first need to convert the probability mass function
 to a cumulative distribution
@@ -491,7 +491,7 @@ always close to 0.25 (for the `P` matrix above).
 
 ### Using QuantEcon's routines
 
-[QuantEcon.py](http://quantecon.org/quantecon-py) has routines for handling Markov chains, including simulation.
+QuantEcon.py has routines for handling Markov chains, including simulation.
 
 Here's an illustration using the same $P$ as the preceding example
 
@@ -585,15 +585,15 @@ $$
 
 There are $n$ such equations, one for each $y \in S$.
 
-If we think of $\psi_{t+1}$ and $\psi_t$ as *row vectors*, these $n$ equations are summarized by the matrix expression
+If we think of $\psi_{t+1}$ and $\psi_t$ as row vectors, these $n$ equations are summarized by the matrix expression
 
 ```{math}
 :label: fin_mc_fr
 
 \psi_{t+1} = \psi_t P
 ```
 
-Thus, to move a distribution forward one unit of time, we postmultiply by $P$.
+Thus, we postmultiply by $P$ to move a distribution forward one unit of time.
 
 By postmultiplying $m$ times, we move a distribution forward $m$ steps into the future.
 
@@ -671,7 +671,7 @@ $$
 The distributions we have been studying can be viewed either
 
 1. as probabilities or
-1. as cross-sectional frequencies that a Law of Large Numbers leads us to anticipate for large samples.
+1. as cross-sectional frequencies that the Law of Large Numbers leads us to anticipate for large samples.
 
 To illustrate, recall our model of employment/unemployment dynamics for a given worker {ref}`discussed above <mc_eg1>`.
 
@@ -788,7 +788,7 @@ Not surprisingly it tends to zero as $\beta \to 0$, and to one as $\alpha \to 0$
 
 ### Calculating stationary distributions
 
-A stable algorithm for computing stationary distributions is implemented in [QuantEcon.py](http://quantecon.org/quantecon-py).
+A stable algorithm for computing stationary distributions is implemented in QuantEcon.py.
 
 Here's an example
 
diff --git a/lectures/markov_chains_II.md b/lectures/markov_chains_II.md
@@ -209,6 +209,8 @@ Theorem 5.2 of {cite}`haggstrom2002finite`.
 (ergodicity)=
 ## Ergodicity
 
+Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.
+
 Under irreducibility, yet another important result obtains:
 
 ````{prf:theorem}
@@ -228,9 +230,9 @@ distribution, then, for all $x \in S$,
 
 Here
 
-* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial
+* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial.
   distribution $\psi_0$
-* $\mathbf{1}\{X_t = x\} = 1$ if $X_t = x$ and zero otherwise
+* $\mathbb{1} \{X_t = x\} = 1$ if $X_t = x$ and zero otherwise.
 
 The result in [theorem 4.3](llnfmc0) is sometimes called **ergodicity**.
 
@@ -242,7 +244,7 @@ This gives us another way to interpret the stationary distribution (provided irr
 
 Importantly, the result is valid for any choice of $\psi_0$.
 
-The theorem is related to {doc}`the law of large numbers <lln_clt>`.
+The theorem is related to {doc}`the Law of Large Numbers <lln_clt>`.
 
 It tells us that, in some settings, the law of large numbers sometimes holds even when the
 sequence of random variables is [not IID](iid_violation).
@@ -394,7 +396,7 @@ Unlike other Markov chains we have seen before, it has a periodic cycle --- the
 
 This is called [periodicity](https://www.randomservices.org/random/markov/Periodicity.html).
 
-It is still irreducible, however, so ergodicity holds.
+It is still irreducible so ergodicity holds.
 
 ```{code-cell} ipython3
 P = np.array([[0, 1],
@@ -424,7 +426,7 @@ for i in range(n):
 plt.show()
 ```
 
-This example helps to emphasize the fact that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.
+This example helps to emphasize that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.
 
 The proportion of time spent in a state can converge to the stationary distribution with periodic chains.