Linear Algebra for ML #4 | Eigen Decomposition


4.0 Introduction

  • Goal : We want to get a diagonalized matrix $D$ of a given matrix $A$ in the form of $ D = V^{-1}AV$ for some reasons such as computation resource.

  • The above diagonalization process is also called eigendecomposition ($A = VDV^{-1}$) because we can find the followings from above equation, $VD=AV$

    1. $D$ is a diagonal matrix with eigenvalues in diagonal entries

    2. $V$ is a matrix whose column vectors are eigenvectors

    3. $A$ is a given square matrix $A \in \mathbb{R}^{n \times n}$

  • Consider the linear transformation $T(\mathbf{x}) = A \mathbf{x} = VDV^{-1} \mathbf{x} $, it can be seen to be a series of following transformation. (4.5.1+ 4.5.2 + 4.5.3)

    1. Change of Basis

    2. Element-wise Scaling

    3. Back to Original Basis



4.1 Eigenvectors and Eigenvalues

  • An eigenvector of a square matrix $A \in \mathbb{R}^{n \times n}$ is a nonzero vector $\mathbf{x} \in \mathbb{R}^n$ such that $A \mathbf{x} = \lambda \mathbf{x}$ for some scalar $\lambda$. In this case $\mathbf{\lambda}$ is called an eigenvalue of $A$, and such an $\mathbf{x}$ is called an eigenvector.

  • $A \mathbf{x}$ can be considered as a linear transformation $T( \mathbf{x})$. If $\mathbf{x} $ is an eigenvector, then $T(x) = A \mathbf{x} = \lambda \mathbf{x} $, which means the output vector has the same direction as $\mathbf{x}$, but the length is scaled by a factor of $\lambda$

Fig1. Example of eigenvector and eigenvalue

  • And here, $8 \begin{bmatrix} 1 \\ 1 \end{bmatrix} $ is faster than $ \begin{bmatrix} 2\ 6 \\ 5 \ 3 \end{bmatrix}\begin{bmatrix} 1\ 1 \end{bmatrix} $

  • The equation $A \mathbf{x} = \lambda \mathbf{x}$ can be re-written as $$(A-\lambda I)\mathbf{x}=\mathbf{0} $$

  • $\lambda$ is an eigenvalue of an $n \times n$ matrix $A$ if and only if this equation has a nontrivial solution. (since $\mathbf{x}$ should be a nonzero vector).

  • The set of all solutions of the above equation is the null space of the matrix $A-\lambda I$, which we call the eigenspace of A corresponding to $\lambda$

  • The eigenspace consists of the zero vector and all the eigenvectors corresponding to $\lambda$, satisfying the above equation.



4.2 Null Space

  • Null Space : The null space of a matrix $A \in \mathbb{R}^{m\times n}$ is the set of all solutions of a homogeneous linear system, $A \mathbf{x = 0}$
  • For $A = \begin{bmatrix} \mathbf{a_1^T} \\ … \\ \mathbf{a_m^T} \end{bmatrix}$, $\mathbf{x}$ should satify the following $ \mathbf{ a_1^Tx =a_2^Tx= … =a_m^Tx }=0$, That is, $\mathbf{x} $ should be orthogonal to every row vector in $A$
  • Orthogonal Complement : The set of all vectors $\mathbf{z}$ that are orthogonal to $W$ is called the orthogonal complement of $W$ and is denoted by $W ^ \perp$.


4.3 Characteristic equation

$$(A - \lambda I) \mathbf{x = 0}$$

  • The set of all solutions of the above equation = null space of the matrix $(A - \lambda I)$ = eigenspace of $A$ corresponding to $\lambda$

  • The eigensapce consists of the zero vector and all the engienvectors corresponding to $\lambda$

  • How can we find the eigenvalues?

  • If $(A-\lambda I)\mathbf{x = 0}$ has a nontrivial solution, then the columns of $(A - \lambda I)$ should be noninvertible.

  • If it is invertible, $\mathbf{x}$ cannot be a nonzero vector since

$$ (A - \lambda I)^{-1} (A - \lambda I) \mathbf{x} = (A - \lambda I)^{-1} \mathbf{0} \longrightarrow \mathbf{x = 0} $$

  • Thus, we can obtain eignevalues by solving characteristic equation :

$$ det(A - \lambda I) = 0$$

  • Also, the solution is not unique, and thus $A-\lambda I$ has linearly dependent columns.

  • Once obtaining eigenvalues, we compute the eigenvectors for each $\lambda$ by solving : $$ (A-\lambda I) \mathbf{x = 0}$$

  • Eigenspace : Note that the dimension of the eigenspace (corresponding to a particular $\lambda$) can be more than one. In this case, any vector in the eigenspace satisfies

$$ T\mathbf{(x)} = A \mathbf{(x)} = \lambda \mathbf{(x)}$$


  • In summary, we can find all the possible eigenvalues and eignevectors, as follows.

  • First, find all the eigenvalue by solving the characteristic equation : $ det(A-\lambda I) = 0$

  • Second, for each eigenvalue $\lambda$, solve for $(A-\lambda I) \mathbf{x=0}$ and obtain the set of basis vectors of the corresponding eigenspace.



4.4 Diagonalization

  • Diagonal matrix : matrix in which the entries outside the main diagonal are all zero

  • Diagonalization : We want to change a given square matrix $A \in \mathbb{R}^{n\times n}$ into a diagonal matrix via the following form : $D = V^{-1} A V $

  • For $A$ to be diagonalizable, an invertible $V$ should exist such that $V^{-1}AV$

  • How can we find an invertible $V$ and the resulting diagonal matrix $D$

  • $V = [\mathbf{v_1, v_2 , … , v_n}]$ where $\mathbb{v_i}$ are column vectors of $V$

  • $D = diag[\lambda_1, \lambda_2 , … , \lambda_n ]$

$$ D = V^{-1}AV $$

$$ VD = AV$$

$$AV = A[ \mathbf{v_1, v_2 , … v_n }] = [A\mathbf{v_1}, A\mathbf{v_2},…, A\mathbf{v_n}], (\text{column vector matmul})$$

$$VD = [\lambda \mathbf{v_1},\lambda \mathbf{v_2}, … ,\lambda \mathbf{v_n}]$$

$$AV=VD \longleftrightarrow [A\mathbf{v_1}, A\mathbf{v_2},…, A\mathbf{v_n}] = [\lambda \mathbf{v_1},\lambda \mathbf{v_2}, … ,\lambda \mathbf{v_n}]$$

  • From above, we obtain

    $$A \mathbf{v_1} = \lambda \mathbf{v_1}, A \mathbf{v_2} = \lambda \mathbf{v_2}, … , A \mathbf{v_n} = \lambda \mathbf{v_n}$$

  • Thus $\mathbf{v_1, v_2, … v_n} $ should be eigenvectors and $\lambda_1, \lambda_2, …. \lambda_n$ should be eigenvalues.

  • Then, For $VD = AV \longrightarrow D = V^{-1}AV$ to be true, $V$ should invertible.

  • In this case, the resulting diagonal matrix D has eigenvalues as diagonal entries

  • Diagonalizable matrix : For $V$ to be invertible, $V$ should be a square matrix in $\mathbb{R}^{n \times n}$, and $V$ should have $n$ linearly independent columns : Hence, $A$ should have $n$ linearly independent eigenvectors.


4.5 Eigen-Decomposition and Linear Transfomation

  • Eigendecomposition : If $A$ is diagonalizable, we can write $A = V^{-1}DV$, and we can also write $A = VDV^{-1}$, which we call eigendecomposition of $A$. So, $A$ being diagonalizable is equilvalent to $A$ having Eigendecomposition.

  • Suppose $A$ is diagonalizable, thus having eigendecomposition $ A = VDV^{-1}$, Consider the linear transformation $T(\mathbf{x}) = A \mathbf{x}$, it can be seen to be a series of following transformation. (4.5.1+ 4.5.2 + 4.5.3)

    $$T(\mathbf{x}) = A\mathbf{x} = VDV^{-1}\mathbf{x} = V(D(V^{-1}\mathbf{x}))$$

4.5.1 Change of Basis


4.5.2 Element-wise and Dimension-Wise Scaling

  • $T(\mathbf{x}) = V(D(V^{-1}\mathbf{x})) = V(D\mathbf{y})$

  • Let $ z=D\mathbf{y}$. This computation is a simple Element-wise scaling of $\mathbf{y}$


4.5.3 Back to Original Basis

  • $T(\mathbf{x}) = V(D\mathbf{y}) = Vz$

  • $z$ is still a coordinate based on the new basis $\mathbf{v_1, v_2}$

  • $Vz$ converts $z$ to another coordinates based on the original standard basis.

  • That is, $Vz$ is a linear combination of $\mathbf{v_1}$ and $\mathbf{v2}$ using the coefficient vector $z$.

  • That is,

$$Vz = \begin{bmatrix} \mathbf{v_1} \ \mathbf{v_2} \end{bmatrix} \begin{bmatrix} z_1 \ z_2 \end{bmatrix} = \mathbf{v_1}z_1 + \mathbf{v_2}z_2$$


Fig3. Eigendecomposition as a series of transformation

Example : $A = VDV^{-1} = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix} \begin{bmatrix} -1&0\\ 0&2 \end{bmatrix}\begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}^{-1}, \text{ and } \bf{x}=\begin{bmatrix} 4\\ 3 \end{bmatrix} $

$$ y = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}^{-1}\begin{bmatrix} 4\\ 3 \end{bmatrix} = \begin{bmatrix} 2\\ 1 \end{bmatrix} $$

$$ z = \begin{bmatrix} -1 & 0\\ 0 &2 \end{bmatrix} \begin{bmatrix} 2\\ 1 \end{bmatrix} = \begin{bmatrix} -2\\ 2 \end{bmatrix}$$

$$ Vz = \begin{bmatrix} 3&{-2}\\ 1&1 \end{bmatrix}\begin{bmatrix} -2\\ 2 \end{bmatrix} = \begin{bmatrix} -10\\ 0 \end{bmatrix} $$


4.5.4 Linaer Transformation via $A^k$

  • Now, consider recursive transformation $A \times A \times A … \times A \mathbf{x} = A^k \mathbf{x} $

  • If $A$ is diagonalizable, $A$ has Eigendecomposition

$$ A = VDV^{-1}$$

$$ A^k = (VDV^{-1})(VDV^{-1})(VDV^{-1})…(VDV^{-1}) = (VD{^k}V^{-1})$$

  • Here, $D^k$ is simply computed as $diag[\lambda_1^k, \lambda_2^k… , \lambda_n^k]$

  • It is much faster to compute $V(D^k(V^{-1}\mathbf{x}))$ than to compute $A^k \mathbf{x}$



Summary

  • Eigenvalue and Eigenvectors

  • Null space, Column space, and orthogonal complement in $\mathbb{R}^n$

  • Diagonalization and Eigendecomposition

  • Linear transforamtion via eigendecomposition