Skip to content

Minor updates #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 23 additions & 31 deletions lectures/eigen_I.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ kernelspec:

## Overview

Eigenvalues and eigenvectors are a somewhat advanced topic in linear and
Eigenvalues and eigenvectors are an advanced topic in linear and
matrix algebra.

At the same time, these concepts are extremely useful for
Expand All @@ -36,14 +36,7 @@ At the same time, these concepts are extremely useful for
* machine learning
* and many other fields of science.

In this lecture we explain the basics of eigenvalues and eigenvectors, and
state two very important results from linear algebra.

The first is called the Neumann series theorem and the second is called the
Perron-Frobenius theorem.

We will explain what these theorems tell us and how we can use them to
understand the predictions of economic models.
In this lecture we explain the basics of eigenvalues and eigenvectors.

We assume in this lecture that students are familiar with matrices and
understand the basics of matrix algebra.
Expand Down Expand Up @@ -89,10 +82,10 @@ a map transforming $x$ into $Ax$.

Because $A$ is $n \times m$, it transforms $m$-vectors into $n$-vectors.

We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$
We can write this formally as $A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$.

(You might argue that if $A$ is a function then we should write
$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.)
You might argue that if $A$ is a function then we should write
$A(x) = y$ rather than $Ax = y$ but the second notation is more conventional.

### Square matrices

Expand All @@ -101,7 +94,7 @@ Let's restrict our discussion to square matrices.
In the above discussion, this means that $m=n$ and $A$ maps $\mathbb R^n$ into
itself.

To repeat, $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
This means $A$ is an $n \times n$ matrix that maps (or "transforms") a vector
$x$ in $\mathbb{R}^n$ into a new vector $y=Ax$ also in $\mathbb{R}^n$.

Here's one example:
Expand Down Expand Up @@ -183,8 +176,8 @@ plt.show()

One way to understand this transformation is that $A$

* first rotates $x$ by some angle $\theta$
* and then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.
* first rotates $x$ by some angle $\theta$ and
* then scales it by some scalar $\gamma$ to obtain the image $y$ of $x$.



Expand All @@ -198,7 +191,7 @@ instead of arrows.
We consider how a given matrix transforms

* a grid of points and
* a set of points located on the unit circle in $\mathbb{R}^2$
* a set of points located on the unit circle in $\mathbb{R}^2$.

To build the transformations we will use two functions, called `grid_transform` and `circle_transform`.

Expand Down Expand Up @@ -498,10 +491,8 @@ same as first applying $B$ on $x$ and then applying $A$ on the vector $Bx$.

Thus the matrix product $AB$ is the
[composition](https://en.wikipedia.org/wiki/Function_composition) of the
matrix transformations $A$ and $B$.

(To compose the transformations, first apply transformation $B$ and then
transformation $A$.)
matrix transformations $A$ and $B$, which repersents first apply transformation $B$ and then
transformation $A$.

When we matrix multiply an $n \times m$ matrix $A$ with an $m \times k$ matrix
$B$ the obtained matrix product is an $n \times k$ matrix $AB$.
Expand Down Expand Up @@ -590,11 +581,11 @@ grid_composition_transform(B,A) #transformation BA

+++ {"user_expressions": []}

It is quite evident that the transformation $AB$ is not the same as the transformation $BA$.
It is evident that the transformation $AB$ is not the same as the transformation $BA$.

## Iterating on a fixed map

In economics (and especially in dynamic modeling), we often are interested in
In economics (and especially in dynamic modeling), we are often interested in
analyzing behavior where we repeatedly apply a fixed matrix.

For example, given a vector $v$ and a matrix $A$, we are interested in
Expand All @@ -603,7 +594,7 @@ studying the sequence
$$
v, \quad
Av, \quad
AAv = A^2v, \ldots
AAv = A^2v, \quad \ldots
$$

Let's first see examples of a sequence of iterates $(A^k v)_{k \geq 0}$ under
Expand Down Expand Up @@ -721,13 +712,14 @@ In this section we introduce the notions of eigenvalues and eigenvectors.

Let $A$ be an $n \times n$ square matrix.

If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that
If $\lambda$ is scalar and $v$ is a non-zero $n$-vector such that

$$
A v = \lambda v
A v = \lambda v.
$$

then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is an *eigenvector*.

Then we say that $\lambda$ is an *eigenvalue* of $A$, and $v$ is the corresponding *eigenvector*.

Thus, an eigenvector of $A$ is a nonzero vector $v$ such that when the map $A$ is
applied, $v$ is merely scaled.
Expand Down Expand Up @@ -792,7 +784,7 @@ plt.show()

So far our definition of eigenvalues and eigenvectors seems straightforward.

There is, however, one complication we haven't mentioned yet:
There is one complication we haven't mentioned yet:

When solving $Av = \lambda v$,

Expand All @@ -812,7 +804,7 @@ The eigenvalue equation is equivalent to $(A - \lambda I) v = 0$.

This equation has a nonzero solution $v$ only when the columns of $A - \lambda I$ are linearly dependent.

This in turn is equivalent to stating that the determinant is zero.
This in turn is equivalent to stating the determinant is zero.

Hence, to find all eigenvalues, we can look for $\lambda$ such that the
determinant of $A - \lambda I$ is zero.
Expand Down Expand Up @@ -860,7 +852,7 @@ evecs #eigenvectors
Note that the *columns* of `evecs` are the eigenvectors.

Since any scalar multiple of an eigenvector is an eigenvector with the same
eigenvalue (check it), the eig routine normalizes the length of each eigenvector
eigenvalue (which can be verified), the `eig` routine normalizes the length of each eigenvector
to one.

The eigenvectors and eigenvalues of a map $A$ determine how a vector $v$ is transformed when we repeatedly multiply by $A$.
Expand All @@ -882,9 +874,9 @@ $$

A thorough discussion of the method can be found [here](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter15.02-The-Power-Method.html).

In this exercise, implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.
In this exercise, first implement the power iteration method and use it to find the largest eigenvalue and its corresponding eigenvector.

Visualize the convergence.
Then visualize the convergence.
```

```{solution-start} eig1_ex1
Expand Down
21 changes: 12 additions & 9 deletions lectures/eigen_II.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Often, in economics, the matrix that we are dealing with is nonnegative.

Nonnegative matrices have several special and useful properties.

In this section we discuss some of them --- in particular, the connection
In this section we will discuss some of them --- in particular, the connection
between nonnegativity and eigenvalues.

Let $a^{k}_{ij}$ be element $(i,j)$ of $A^k$.
Expand All @@ -63,7 +63,7 @@ We denote this as $A \geq 0$.
(irreducible)=
### Irreducible matrices

We have (informally) introduced irreducible matrices in the Markov chain lecture (TODO: link to Markov chain lecture).
We have (informally) introduced irreducible matrices in the [Markov chain lecture](markov_chains_II.md).

Here we will introduce this concept formally.

Expand Down Expand Up @@ -157,9 +157,8 @@ This is a more common expression and where the name left eigenvectors originates
For a nonnegative matrix $A$ the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest
absolute value, often called the **dominant eigenvalue**.

For a matrix $A$, the Perron-Frobenius Theorem characterizes certain
properties of the dominant eigenvalue and its corresponding eigenvector when
$A$ is a nonnegative square matrix.
For a matrix nonnegative square matrix $A$, the Perron-Frobenius Theorem characterizes certain
properties of the dominant eigenvalue and its corresponding eigenvector.

```{prf:Theorem} Perron-Frobenius Theorem
:label: perron-frobenius
Expand All @@ -179,7 +178,9 @@ If $A$ is primitive then,

6. the inequality $|\lambda| \leq r(A)$ is **strict** for all eigenvalues $\lambda$ of $A$ distinct from $r(A)$, and
7. with $v$ and $w$ normalized so that the inner product of $w$ and $v = 1$, we have
$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. $v w^{\top}$ is called the **Perron projection** of $A$.
$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$.
\
the matrix $v w^{\top}$ is called the **Perron projection** of $A$.
```

(This is a relatively simple version of the theorem --- for more details see
Expand Down Expand Up @@ -299,7 +300,7 @@ def check_convergence(M):

# Calculate the norm of the difference matrix
diff_norm = np.linalg.norm(diff, 'fro')
print(f"n = {n}, norm of the difference: {diff_norm:.10f}")
print(f"n = {n}, error = {diff_norm:.10f}")


A1 = np.array([[1, 2],
Expand Down Expand Up @@ -394,6 +395,8 @@ In the {ref}`exercise<mc1_ex_1>`, we stated that the convergence rate is determi

This can be proven using what we have learned here.

Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.

With Markov model $M$ with state space $S$ and transition matrix $P$, we can write $P^t$ as

$$
Expand All @@ -402,7 +405,7 @@ $$

This is proven in {cite}`sargent2023economic` and a nice discussion can be found [here](https://math.stackexchange.com/questions/2433997/can-all-matrices-be-decomposed-as-product-of-right-and-left-eigenvector).

In the formula $\lambda_i$ is an eigenvalue of $P$ and $v_i$ and $w_i$ are the right and left eigenvectors corresponding to $\lambda_i$.
In this formula $\lambda_i$ is an eigenvalue of $P$ with corresponding right and left eigenvectors $v_i$ and $w_i$ .

Premultiplying $P^t$ by arbitrary $\psi \in \mathscr{D}(S)$ and rearranging now gives

Expand Down Expand Up @@ -485,7 +488,7 @@ The following is a fundamental result in functional analysis that generalizes

Let $A$ be a square matrix and let $A^k$ be the $k$-th power of $A$.

Let $r(A)$ be the dominant eigenvector or as it is commonly called the *spectral radius*, defined as $\max_i |\lambda_i|$, where
Let $r(A)$ be the **spectral radius** of $A$, defined as $\max_i |\lambda_i|$, where

* $\{\lambda_i\}_i$ is the set of eigenvalues of $A$ and
* $|\lambda_i|$ is the modulus of the complex number $\lambda_i$
Expand Down
22 changes: 11 additions & 11 deletions lectures/markov_chains_I.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ In other words,

If $P$ is a stochastic matrix, then so is the $k$-th power $P^k$ for all $k \in \mathbb N$.

Checking this is {ref}`one of the exercises <mc1_ex_3>` below.
Checking this in {ref}`the first exercises <mc1_ex_3>` below.


### Markov chains
Expand Down Expand Up @@ -255,11 +255,11 @@ We'll cover some of these applications below.
(mc_eg3)=
#### Example 3

Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy (D), autocracy (A), and an intermediate state called anocracy (N).
Imam and Temple {cite}`imampolitical` categorize political institutions into three types: democracy $\text{(D)}$, autocracy $\text{(A)}$, and an intermediate state called anocracy $\text{(N)}$.

Each institution can have two potential development regimes: collapse (C) and growth (G). This results in six possible states: DG, DC, NG, NC, AG, and AC.
Each institution can have two potential development regimes: collapse $\text{(C)}$ and growth $\text{(G)}$. This results in six possible states: $\text{DG, DC, NG, NC, AG}$ and $\text{AC}$.

The lower probability of transitioning from NC to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.
The lower probability of transitioning from $\text{NC}$ to itself indicates that collapses in anocracies quickly evolve into changes in the political institution.

Democracies tend to have longer-lasting growth regimes compared to autocracies as indicated by the lower probability of transitioning from growth to growth in autocracies.

Expand Down Expand Up @@ -393,7 +393,7 @@ In these exercises, we'll take the state space to be $S = 0,\ldots, n-1$.
To simulate a Markov chain, we need

1. a stochastic matrix $P$ and
1. a probability mass function $\psi_0$ of length $n$ from which to draw a initial realization of $X_0$.
1. a probability mass function $\psi_0$ of length $n$ from which to draw an initial realization of $X_0$.

The Markov chain is then constructed as follows:

Expand All @@ -405,7 +405,7 @@ The Markov chain is then constructed as follows:
To implement this simulation procedure, we need a method for generating draws
from a discrete distribution.

For this task, we'll use `random.draw` from [QuantEcon](http://quantecon.org/quantecon-py).
For this task, we'll use `random.draw` from [QuantEcon.py](http://quantecon.org/quantecon-py).

To use `random.draw`, we first need to convert the probability mass function
to a cumulative distribution
Expand Down Expand Up @@ -491,7 +491,7 @@ always close to 0.25 (for the `P` matrix above).

### Using QuantEcon's routines

[QuantEcon.py](http://quantecon.org/quantecon-py) has routines for handling Markov chains, including simulation.
QuantEcon.py has routines for handling Markov chains, including simulation.

Here's an illustration using the same $P$ as the preceding example

Expand Down Expand Up @@ -585,15 +585,15 @@ $$

There are $n$ such equations, one for each $y \in S$.

If we think of $\psi_{t+1}$ and $\psi_t$ as *row vectors*, these $n$ equations are summarized by the matrix expression
If we think of $\psi_{t+1}$ and $\psi_t$ as row vectors, these $n$ equations are summarized by the matrix expression

```{math}
:label: fin_mc_fr

\psi_{t+1} = \psi_t P
```

Thus, to move a distribution forward one unit of time, we postmultiply by $P$.
Thus, we postmultiply by $P$ to move a distribution forward one unit of time.

By postmultiplying $m$ times, we move a distribution forward $m$ steps into the future.

Expand Down Expand Up @@ -671,7 +671,7 @@ $$
The distributions we have been studying can be viewed either

1. as probabilities or
1. as cross-sectional frequencies that a Law of Large Numbers leads us to anticipate for large samples.
1. as cross-sectional frequencies that the Law of Large Numbers leads us to anticipate for large samples.

To illustrate, recall our model of employment/unemployment dynamics for a given worker {ref}`discussed above <mc_eg1>`.

Expand Down Expand Up @@ -788,7 +788,7 @@ Not surprisingly it tends to zero as $\beta \to 0$, and to one as $\alpha \to 0$

### Calculating stationary distributions

A stable algorithm for computing stationary distributions is implemented in [QuantEcon.py](http://quantecon.org/quantecon-py).
A stable algorithm for computing stationary distributions is implemented in QuantEcon.py.

Here's an example

Expand Down
12 changes: 7 additions & 5 deletions lectures/markov_chains_II.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,8 @@ Theorem 5.2 of {cite}`haggstrom2002finite`.
(ergodicity)=
## Ergodicity

Please note that we use $\mathbb{1}$ for a vector of ones in this lecture.

Under irreducibility, yet another important result obtains:

````{prf:theorem}
Expand All @@ -228,9 +230,9 @@ distribution, then, for all $x \in S$,

Here

* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial
* $\{X_t\}$ is a Markov chain with stochastic matrix $P$ and initial.
distribution $\psi_0$
* $\mathbf{1}\{X_t = x\} = 1$ if $X_t = x$ and zero otherwise
* $\mathbb{1} \{X_t = x\} = 1$ if $X_t = x$ and zero otherwise.

The result in [theorem 4.3](llnfmc0) is sometimes called **ergodicity**.

Expand All @@ -242,7 +244,7 @@ This gives us another way to interpret the stationary distribution (provided irr

Importantly, the result is valid for any choice of $\psi_0$.

The theorem is related to {doc}`the law of large numbers <lln_clt>`.
The theorem is related to {doc}`the Law of Large Numbers <lln_clt>`.

It tells us that, in some settings, the law of large numbers sometimes holds even when the
sequence of random variables is [not IID](iid_violation).
Expand Down Expand Up @@ -394,7 +396,7 @@ Unlike other Markov chains we have seen before, it has a periodic cycle --- the

This is called [periodicity](https://www.randomservices.org/random/markov/Periodicity.html).

It is still irreducible, however, so ergodicity holds.
It is still irreducible so ergodicity holds.

```{code-cell} ipython3
P = np.array([[0, 1],
Expand Down Expand Up @@ -424,7 +426,7 @@ for i in range(n):
plt.show()
```

This example helps to emphasize the fact that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.
This example helps to emphasize that asymptotic stationarity is about the distribution, while ergodicity is about the sample path.

The proportion of time spent in a state can converge to the stationary distribution with periodic chains.

Expand Down