diff --git a/lectures/eigen_I.md b/lectures/eigen_I.md index 1d48220b..d53faa92 100644 --- a/lectures/eigen_I.md +++ b/lectures/eigen_I.md @@ -25,7 +25,7 @@ kernelspec: ## Overview -Eigenvalues and eigenvectors are a somewhat advanced topic in linear and +Eigenvalues and eigenvectors are an advanced topic in linear and matrix algebra. At the same time, these concepts are extremely useful for @@ -36,14 +36,7 @@ At the same time, these concepts are extremely useful for * machine learning * and many other fields of science. -In this lecture we explain the basics of eigenvalues and eigenvectors, and -state two very important results from linear algebra. - -The first is called the Neumann series theorem and the second is called the -Perron-Frobenius theorem. - -We will explain what these theorems tell us and how we can use them to -understand the predictions of economic models. +In this lecture we explain the basics of eigenvalues and eigenvectors. We assume in this lecture that students are familiar with matrices and understand the basics of matrix algebra. @@ -89,10 +82,10 @@ a map transforming $x$ into $Ax$. Because $A$ is $n \times m$, it transforms $m$-vectors into $n$-vectors. -We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$ +We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$. -(You might argue that if $A$ is a function then we should write -$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.) +You might argue that if $A$ is a function then we should write +$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional. ### Square matrices @@ -101,7 +94,7 @@ Let's restrict our discussion to square matrices. In the above discussion, this means that $m=n$ and $A$ maps $\mathbb R^n$ into itself. -To repeat, $A$ is an $n \times n$ matrix that maps (or "transforms") a vector +This means $A$ is an $n \times n$ matrix that maps (or "transforms") a vector $x$ in $\mathbb{R}^n$ into a new vector $y=Ax$ also in $\mathbb{R}^n$. Here's one example: @@ -183,8 +176,8 @@ plt.show() One way to understand this transformation is that $A$ -* first rotates $x$ by some angle $\theta$ -* and then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$. +* first rotates $x$ by some angle $\theta$ and +* then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$. @@ -198,7 +191,7 @@ instead of arrows. We consider how a given matrix transforms * a grid of points and -* a set of points located on the unit circle in $\mathbb{R}^2$ +* a set of points located on the unit circle in $\mathbb{R}^2$. To build the transformations we will use two functions, called `grid_transform` and `circle_transform`. @@ -498,10 +491,8 @@ same as first applying $B$ on $x$ and then applying $A$ on the vector $Bx$. Thus the matrix product $AB$ is the [composition](https://en.wikipedia.org/wiki/Function_composition) of the -matrix transformations $A$ and $B$. - -(To compose the transformations, first apply transformation $B$ and then -transformation $A$.) +matrix transformations $A$ and $B$, which repersents first apply transformation $B$ and then +transformation $A$. When we matrix multiply an $n \times m$ matrix $A$ with an $m \times k$ matrix $B$ the obtained matrix product is an $n \times k$ matrix $AB$. @@ -590,11 +581,11 @@ grid_composition_transform(B,A) #transformation BA +++ {"user_expressions": []} -It is quite evident that the transformation $AB$ is not the same as the transformation $BA$. +It is evident that the transformation $AB$ is not the same as the transformation $BA$. ## Iterating on a fixed map -In economics (and especially in dynamic modeling), we often are interested in +In economics (and especially in dynamic modeling), we are often interested in analyzing behavior where we repeatedly apply a fixed matrix. For example, given a vector $v$ and a matrix $A$, we are interested in @@ -603,7 +594,7 @@ studying the sequence $$ v, \quad Av, \quad - AAv = A^2v, \ldots + AAv = A^2v, \quad \ldots $$ Let's first see examples of a sequence of iterates $(A^k v)_{k \geq 0}$ under @@ -721,13 +712,14 @@ In this section we introduce the notions of eigenvalues and eigenvectors. Let $A$ be an $n \times n$ square matrix. -If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that +If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that $$ -A v = \lambda v +A v = \lambda v. $$ -then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is an *eigenvector*. + +Then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is the corresponding *eigenvector*. Thus, an eigenvector of $A$ is a nonzero vector $v$ such that when the map $A$ is applied, $v$ is merely scaled. @@ -792,7 +784,7 @@ plt.show() So far our definition of eigenvalues and eigenvectors seems straightforward. -There is, however, one complication we haven't mentioned yet: +There is one complication we haven't mentioned yet: When solving $Av = \lambda v$, @@ -812,7 +804,7 @@ The eigenvalue equation is equivalent to $(A - \lambda I) v = 0$. This equation has a nonzero solution $v$ only when the columns of $A - \lambda I$ are linearly dependent. -This in turn is equivalent to stating that the determinant is zero. +This in turn is equivalent to stating the determinant is zero. Hence, to find all eigenvalues, we can look for $\lambda$ such that the determinant of $A - \lambda I$ is zero. @@ -860,7 +852,7 @@ evecs #eigenvectors Note that the *columns* of `evecs` are the eigenvectors. Since any scalar multiple of an eigenvector is an eigenvector with the same -eigenvalue (check it), the eig routine normalizes the length of each eigenvector +eigenvalue (which can be verified), the `eig` routine normalizes the length of each eigenvector to one. The eigenvectors and eigenvalues of a map $A$ determine how a vector $v$ is transformed when we repeatedly multiply by $A$. @@ -882,9 +874,9 @@ $$ A thorough discussion of the method can be found [here](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter15.02-The-Power-Method.html). -In this exercise, implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector. +In this exercise, first implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector. -Visualize the convergence. +Then visualize the convergence. ``` ```{solution-start} eig1_ex1 diff --git a/lectures/eigen_II.md b/lectures/eigen_II.md index d399ba84..4eda195d 100644 --- a/lectures/eigen_II.md +++ b/lectures/eigen_II.md @@ -50,7 +50,7 @@ Often, in economics, the matrix that we are dealing with is nonnegative. Nonnegative matrices have several special and useful properties. -In this section we discuss some of them --- in particular, the connection +In this section we will discuss some of them --- in particular, the connection between nonnegativity and eigenvalues. Let $a^{k}_{ij}$ be element $(i,j)$ of $A^k$. @@ -63,7 +63,7 @@ We denote this as $A \geq 0$. (irreducible)= ### Irreducible matrices -We have (informally) introduced irreducible matrices in the Markov chain lecture (TODO: link to Markov chain lecture). +We have (informally) introduced irreducible matrices in the [Markov chain lecture](markov_chains_II.md). Here we will introduce this concept formally. @@ -157,9 +157,8 @@ This is a more common expression and where the name left eigenvectors originates For a nonnegative matrix $A$ the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest absolute value, often called the **dominant eigenvalue**. -For a matrix $A$, the Perron-Frobenius Theorem characterizes certain -properties of the dominant eigenvalue and its corresponding eigenvector when -$A$ is a nonnegative square matrix. +For a matrix nonnegative square matrix $A$, the Perron-Frobenius Theorem characterizes certain +properties of the dominant eigenvalue and its corresponding eigenvector. ```{prf:Theorem} Perron-Frobenius Theorem :label: perron-frobenius @@ -179,7 +178,9 @@ If $A$ is primitive then, 6. the inequality $|\lambda| \leq r(A)$ is **strict** for all eigenvalues $\lambda$ of $A$ distinct from $r(A)$, and 7. with $v$ and $w$ normalized so that the inner product of $w$ and $v = 1$, we have -$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. $v w^{\top}$ is called the **Perron projection** of $A$. +$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. +\ +the matrix $v w^{\top}$ is called the **Perron projection** of $A$. ``` (This is a relatively simple version of the theorem --- for more details see @@ -299,7 +300,7 @@ def check_convergence(M): # Calculate the norm of the difference matrix diff_norm = np.linalg.norm(diff, 'fro') - print(f"n = {n}, norm of the difference: {diff_norm:.10f}") + print(f"n = {n}, error = {diff_norm:.10f}") A1 = np.array([[1, 2], @@ -394,6 +395,8 @@ In the {ref}`exercise`, we stated that the convergence rate is determi This can be proven using what we have learned here. +Please note that we use $\mathbb{1}$ for a vector of ones in this lecture. + With Markov model $M$ with state space $S$ and transition matrix $P$, we can write $P^t$ as $$ @@ -402,7 +405,7 @@ $$ This is proven in {cite}`sargent2023economic` and a nice discussion can be found [here](https://math.stackexchange.com/questions/2433997/can-all-matrices-be-decomposed-as-product-of-right-and-left-eigenvector). -In the formula $\lambda_i$ is an eigenvalue of $P$ and $v_i$ and $w_i$ are the right and left eigenvectors corresponding to $\lambda_i$. +In this formula $\lambda_i$ is an eigenvalue of $P$ with corresponding right and left eigenvectors $v_i$ and $w_i$ . Premultiplying $P^t$ by arbitrary $\psi \in \mathscr{D}(S)$ and rearranging now gives @@ -485,7 +488,7 @@ The following is a fundamental result in functional analysis that generalizes Let $A$ be a square matrix and let $A^k$ be the $k$-th power of $A$. -Let $r(A)$ be the dominant eigenvector or as it is commonly called the *spectral radius*, defined as $\max_i |\lambda_i|$, where +Let $r(A)$ be the **spectral radius** of $A$, defined as $\max_i |\lambda_i|$, where * $\{\lambda_i\}_i$ is the set of eigenvalues of $A$ and * $|\lambda_i|$ is the modulus of the complex number $\lambda_i$ diff --git a/lectures/markov_chains_I.md b/lectures/markov_chains_I.md index 6db19fc5..f396b23c 100644 --- a/lectures/markov_chains_I.md +++ b/lectures/markov_chains_I.md @@ -98,7 +98,7 @@ In other words, If $P$ is a stochastic matrix, then so is the $k$-th power $P^k$ for all $k \in \mathbb N$. -Checking this is {ref}`one of the exercises ` below. +Checking this in {ref}`the first exercises ` below. ### Markov chains @@ -255,11 +255,11 @@ We'll cover some of these applications below. (mc_eg3)= #### Example 3 -Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy (D), autocracy (A), and an intermediate state called anocracy (N). +Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy $\text{(D)}$, autocracy $\text{(A)}$, and an intermediate state called anocracy $\text{(N)}$. -Each institution can have two potential development regimes: collapse (C) and growth (G). This results in six possible states: DG, DC, NG, NC, AG, and AC. +Each institution can have two potential development regimes: collapse $\text{(C)}$ and growth $\text{(G)}$. This results in six possible states: $\text{DG, DC, NG, NC, AG}$ and $\text{AC}$. -The lower probability of transitioning from NC to itself indicates that collapses in anocracies quickly evolve into changes in the political institution. +The lower probability of transitioning from $\text{NC}$ to itself indicates that collapses in anocracies quickly evolve into changes in the political institution. Democracies tend to have longer-lasting growth regimes compared to autocracies as indicated by the lower probability of transitioning from growth to growth in autocracies. @@ -393,7 +393,7 @@ In these exercises, we'll take the state space to be $S = 0,\ldots, n-1$. To simulate a Markov chain, we need 1. a stochastic matrix $P$ and -1. a probability mass function $\psi_0$ of length $n$ from which to draw a initial realization of $X_0$. +1. a probability mass function $\psi_0$ of length $n$ from which to draw an initial realization of $X_0$. The Markov chain is then constructed as follows: @@ -405,7 +405,7 @@ The Markov chain is then constructed as follows: To implement this simulation procedure, we need a method for generating draws from a discrete distribution. -For this task, we'll use `random.draw` from [QuantEcon](http://quantecon.org/quantecon-py). +For this task, we'll use `random.draw` from [QuantEcon.py](http://quantecon.org/quantecon-py). To use `random.draw`, we first need to convert the probability mass function to a cumulative distribution @@ -491,7 +491,7 @@ always close to 0.25 (for the `P` matrix above). ### Using QuantEcon's routines -[QuantEcon.py](http://quantecon.org/quantecon-py) has routines for handling Markov chains, including simulation. +QuantEcon.py has routines for handling Markov chains, including simulation. Here's an illustration using the same $P$ as the preceding example @@ -585,7 +585,7 @@ $$ There are $n$ such equations, one for each $y \in S$. -If we think of $\psi_{t+1}$ and $\psi_t$ as *row vectors*, these $n$ equations are summarized by the matrix expression +If we think of $\psi_{t+1}$ and $\psi_t$ as row vectors, these $n$ equations are summarized by the matrix expression ```{math} :label: fin_mc_fr @@ -593,7 +593,7 @@ If we think of $\psi_{t+1}$ and $\psi_t$ as *row vectors*, these $n$ equations a \psi_{t+1} = \psi_t P ``` -Thus, to move a distribution forward one unit of time, we postmultiply by $P$. +Thus, we postmultiply by $P$ to move a distribution forward one unit of time. By postmultiplying $m$ times, we move a distribution forward $m$ steps into the future. @@ -671,7 +671,7 @@ $$ The distributions we have been studying can be viewed either 1. as probabilities or -1. as cross-sectional frequencies that a Law of Large Numbers leads us to anticipate for large samples. +1. as cross-sectional frequencies that the Law of Large Numbers leads us to anticipate for large samples. To illustrate, recall our model of employment/unemployment dynamics for a given worker {ref}`discussed above `. @@ -788,7 +788,7 @@ Not surprisingly it tends to zero as $\beta \to 0$, and to one as $\alpha \to 0$ ### Calculating stationary distributions -A stable algorithm for computing stationary distributions is implemented in [QuantEcon.py](http://quantecon.org/quantecon-py). +A stable algorithm for computing stationary distributions is implemented in QuantEcon.py. Here's an example diff --git a/lectures/markov_chains_II.md b/lectures/markov_chains_II.md index 6bfbf9a8..d4938a2a 100644 --- a/lectures/markov_chains_II.md +++ b/lectures/markov_chains_II.md @@ -209,6 +209,8 @@ Theorem 5.2 of {cite}`haggstrom2002finite`. (ergodicity)= ## Ergodicity +Please note that we use $\mathbb{1}$ for a vector of ones in this lecture. + Under irreducibility, yet another important result obtains: ````{prf:theorem} @@ -228,9 +230,9 @@ distribution, then, for all $x \in S$, Here -* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial +* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial. distribution $\psi_0$ -* $\mathbf{1}\{X_t = x\} = 1$ if $X_t = x$ and zero otherwise +* $\mathbb{1} \{X_t = x\} = 1$ if $X_t = x$ and zero otherwise. The result in [theorem 4.3](llnfmc0) is sometimes called **ergodicity**. @@ -242,7 +244,7 @@ This gives us another way to interpret the stationary distribution (provided irr Importantly, the result is valid for any choice of $\psi_0$. -The theorem is related to {doc}`the law of large numbers `. +The theorem is related to {doc}`the Law of Large Numbers `. It tells us that, in some settings, the law of large numbers sometimes holds even when the sequence of random variables is [not IID](iid_violation). @@ -394,7 +396,7 @@ Unlike other Markov chains we have seen before, it has a periodic cycle --- the This is called [periodicity](https://www.randomservices.org/random/markov/Periodicity.html). -It is still irreducible, however, so ergodicity holds. +It is still irreducible so ergodicity holds. ```{code-cell} ipython3 P = np.array([[0, 1], @@ -424,7 +426,7 @@ for i in range(n): plt.show() ``` -This example helps to emphasize the fact that asymptotic stationarity is about the distribution, while ergodicity is about the sample path. +This example helps to emphasize that asymptotic stationarity is about the distribution, while ergodicity is about the sample path. The proportion of time spent in a state can converge to the stationary distribution with periodic chains.