Linear Algebra for ML #4 | Eigen Decomposition

2019-07-07 1. Linear Algebra Comments

4.0 Introduction

Goal : We want to get a diagonalized matrix $D$ of a given matrix $A$ in the form of $ D = V^{-1}AV$ for some reasons such as computation resource.
The above diagonalization process is also called eigendecomposition ($A = VDV^{-1}$) because we can find the followings from above equation, $VD=AV$
1. $D$ is a diagonal matrix with eigenvalues in diagonal entries
2. $V$ is a matrix whose column vectors are eigenvectors
3. $A$ is a given square matrix $A \in \mathbb{R}^{n \times n}$
Consider the linear transformation $T(\mathbf{x}) = A \mathbf{x} = VDV^{-1} \mathbf{x} $, it can be seen to be a series of following transformation. (4.5.1+ 4.5.2 + 4.5.3)
1. Change of Basis
2. Element-wise Scaling
3. Back to Original Basis

4.1 Eigenvectors and Eigenvalues

An eigenvector of a square matrix $A \in \mathbb{R}^{n \times n}$ is a nonzero vector $\mathbf{x} \in \mathbb{R}^n$ such that $A \mathbf{x} = \lambda \mathbf{x}$ for some scalar $\lambda$. In this case $\mathbf{\lambda}$ is called an eigenvalue of $A$, and such an $\mathbf{x}$ is called an eigenvector.
$A \mathbf{x}$ can be considered as a linear transformation $T( \mathbf{x})$. If $\mathbf{x} $ is an eigenvector, then $T(x) = A \mathbf{x} = \lambda \mathbf{x} $, which means the output vector has the same direction as $\mathbf{x}$, but the length is scaled by a factor of $\lambda$

Fig1. Example of eigenvector and eigenvalue

And here, $8 \begin{bmatrix} 1 \\ 1 \end{bmatrix} $ is faster than $ \begin{bmatrix} 2\ 6 \\ 5 \ 3 \end{bmatrix}\begin{bmatrix} 1\ 1 \end{bmatrix} $
The equation $A \mathbf{x} = \lambda \mathbf{x}$ can be re-written as $$(A-\lambda I)\mathbf{x}=\mathbf{0} $$
$\lambda$ is an eigenvalue of an $n \times n$ matrix $A$ if and only if this equation has a nontrivial solution. (since $\mathbf{x}$ should be a nonzero vector).
The set of all solutions of the above equation is the null space of the matrix $A-\lambda I$, which we call the eigenspace of A corresponding to $\lambda$
The eigenspace consists of the zero vector and all the eigenvectors corresponding to $\lambda$, satisfying the above equation.

4.2 Null Space

Null Space : The null space of a matrix $A \in \mathbb{R}^{m\times n}$ is the set of all solutions of a homogeneous linear system, $A \mathbf{x = 0}$
For $A = \begin{bmatrix} \mathbf{a_1^T} \\ … \\ \mathbf{a_m^T} \end{bmatrix}$, $\mathbf{x}$ should satify the following $ \mathbf{ a_1^Tx =a_2^Tx= … =a_m^Tx }=0$, That is, $\mathbf{x} $ should be orthogonal to every row vector in $A$
Orthogonal Complement : The set of all vectors $\mathbf{z}$ that are orthogonal to $W$ is called the orthogonal complement of $W$ and is denoted by $W ^ \perp$.

4.3 Characteristic equation

$$(A - \lambda I) \mathbf{x = 0}$$

The set of all solutions of the above equation = null space of the matrix $(A - \lambda I)$ = eigenspace of $A$ corresponding to $\lambda$
The eigensapce consists of the zero vector and all the engienvectors corresponding to $\lambda$
How can we find the eigenvalues?
If $(A-\lambda I)\mathbf{x = 0}$ has a nontrivial solution, then the columns of $(A - \lambda I)$ should be noninvertible.
If it is invertible, $\mathbf{x}$ cannot be a nonzero vector since

$$ (A - \lambda I)^{-1} (A - \lambda I) \mathbf{x} = (A - \lambda I)^{-1} \mathbf{0} \longrightarrow \mathbf{x = 0} $$

Thus, we can obtain eignevalues by solving characteristic equation :

$$ det(A - \lambda I) = 0$$

Also, the solution is not unique, and thus $A-\lambda I$ has linearly dependent columns.
Once obtaining eigenvalues, we compute the eigenvectors for each $\lambda$ by solving : $$ (A-\lambda I) \mathbf{x = 0}$$
Eigenspace : Note that the dimension of the eigenspace (corresponding to a particular $\lambda$) can be more than one. In this case, any vector in the eigenspace satisfies

$$ T\mathbf{(x)} = A \mathbf{(x)} = \lambda \mathbf{(x)}$$

In summary, we can find all the possible eigenvalues and eignevectors, as follows.
First, find all the eigenvalue by solving the characteristic equation : $ det(A-\lambda I) = 0$
Second, for each eigenvalue $\lambda$, solve for $(A-\lambda I) \mathbf{x=0}$ and obtain the set of basis vectors of the corresponding eigenspace.

4.4 Diagonalization

Diagonal matrix : matrix in which the entries outside the main diagonal are all zero
Diagonalization : We want to change a given square matrix $A \in \mathbb{R}^{n\times n}$ into a diagonal matrix via the following form : $D = V^{-1} A V $
For $A$ to be diagonalizable, an invertible $V$ should exist such that $V^{-1}AV$
How can we find an invertible $V$ and the resulting diagonal matrix $D$
$V = [\mathbf{v_1, v_2 , … , v_n}]$ where $\mathbb{v_i}$ are column vectors of $V$
$D = diag[\lambda_1, \lambda_2 , … , \lambda_n ]$

$$ D = V^{-1}AV $$

$$ VD = AV$$

$$AV = A[ \mathbf{v_1, v_2 , … v_n }] = [A\mathbf{v_1}, A\mathbf{v_2},…, A\mathbf{v_n}], (\text{column vector matmul})$$

$$VD = [\lambda \mathbf{v_1},\lambda \mathbf{v_2}, … ,\lambda \mathbf{v_n}]$$

$$AV=VD \longleftrightarrow [A\mathbf{v_1}, A\mathbf{v_2},…, A\mathbf{v_n}] = [\lambda \mathbf{v_1},\lambda \mathbf{v_2}, … ,\lambda \mathbf{v_n}]$$

From above, we obtain

$$A \mathbf{v_1} = \lambda \mathbf{v_1}, A \mathbf{v_2} = \lambda \mathbf{v_2}, … , A \mathbf{v_n} = \lambda \mathbf{v_n}$$
Thus $\mathbf{v_1, v_2, … v_n} $ should be eigenvectors and $\lambda_1, \lambda_2, …. \lambda_n$ should be eigenvalues.
Then, For $VD = AV \longrightarrow D = V^{-1}AV$ to be true, $V$ should invertible.
In this case, the resulting diagonal matrix D has eigenvalues as diagonal entries

Diagonalizable matrix : For $V$ to be invertible, $V$ should be a square matrix in $\mathbb{R}^{n \times n}$, and $V$ should have $n$ linearly independent columns : Hence, $A$ should have $n$ linearly independent eigenvectors.

4.5 Eigen-Decomposition and Linear Transfomation

Eigendecomposition : If $A$ is diagonalizable, we can write $A = V^{-1}DV$, and we can also write $A = VDV^{-1}$, which we call eigendecomposition of $A$. So, $A$ being diagonalizable is equilvalent to $A$ having Eigendecomposition.
Suppose $A$ is diagonalizable, thus having eigendecomposition $ A = VDV^{-1}$, Consider the linear transformation $T(\mathbf{x}) = A \mathbf{x}$, it can be seen to be a series of following transformation. (4.5.1+ 4.5.2 + 4.5.3)

$$T(\mathbf{x}) = A\mathbf{x} = VDV^{-1}\mathbf{x} = V(D(V^{-1}\mathbf{x}))$$

4.5.1 Change of Basis

Let $\mathbf{y} = V^{-1}\mathbf{x}$ then $V \mathbf{y=x}$
$\mathbf{y}$ is a new coordinate of $\mathbf{x}$ with respect to a new basis of eigenvectors {$\mathbf{v_1, v_2}$}
https://angeloyeo.github.io/2020/12/07/change_of_basis.html

4.5.2 Element-wise and Dimension-Wise Scaling

$T(\mathbf{x}) = V(D(V^{-1}\mathbf{x})) = V(D\mathbf{y})$
Let $ z=D\mathbf{y}$. This computation is a simple Element-wise scaling of $\mathbf{y}$

4.5.3 Back to Original Basis

$T(\mathbf{x}) = V(D\mathbf{y}) = Vz$
$z$ is still a coordinate based on the new basis $\mathbf{v_1, v_2}$
$Vz$ converts $z$ to another coordinates based on the original standard basis.
That is, $Vz$ is a linear combination of $\mathbf{v_1}$ and $\mathbf{v2}$ using the coefficient vector $z$.
That is,

$$Vz = \begin{bmatrix} \mathbf{v_1} \ \mathbf{v_2} \end{bmatrix} \begin{bmatrix} z_1 \ z_2 \end{bmatrix} = \mathbf{v_1}z_1 + \mathbf{v_2}z_2$$

Fig3. Eigendecomposition as a series of transformation

Example : $A = VDV^{-1} = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix} \begin{bmatrix} -1&0\\ 0&2 \end{bmatrix}\begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}^{-1}, \text{ and } \bf{x}=\begin{bmatrix} 4\\ 3 \end{bmatrix} $

$$ y = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}^{-1}\begin{bmatrix} 4\\ 3 \end{bmatrix} = \begin{bmatrix} 2\\ 1 \end{bmatrix} $$

$$ z = \begin{bmatrix} -1 & 0\\ 0 &2 \end{bmatrix} \begin{bmatrix} 2\\ 1 \end{bmatrix} = \begin{bmatrix} -2\\ 2 \end{bmatrix}$$

$$ Vz = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}\begin{bmatrix} -2\\ 2 \end{bmatrix} = \begin{bmatrix} -10\\ 0 \end{bmatrix} $$

4.5.4 Linaer Transformation via $A^k$

Now, consider recursive transformation $A \times A \times A … \times A \mathbf{x} = A^k \mathbf{x} $
If $A$ is diagonalizable, $A$ has Eigendecomposition

$$ A = VDV^{-1}$$

$$ A^k = (VDV^{-1})(VDV^{-1})(VDV^{-1})…(VDV^{-1}) = (VD{^k}V^{-1})$$

Here, $D^k$ is simply computed as $diag[\lambda_1^k, \lambda_2^k… , \lambda_n^k]$
It is much faster to compute $V(D^k(V^{-1}\mathbf{x}))$ than to compute $A^k \mathbf{x}$

Summary

Eigenvalue and Eigenvectors
Null space, Column space, and orthogonal complement in $\mathbb{R}^n$
Diagonalization and Eigendecomposition
Linear transforamtion via eigendecomposition