Linear Algebra for ML #2 | Linear System & Linear Transform

2019-05-08 1. Linear Algebra Comments

2.1 Linear Equation and Linear System

Linear Equation is an equation that can be written in the form $$ a_1x_1 + …. a_nx_n = b $$
The above equation can be written as $ \textbf{a}^T \textbf{x} = b $.
Linear System is a collection of one or more linear equations

Identity Matrix : $I$ is a square matrix whose diagonal entries are all 1’s and all the other entries are zeros.
Inverse Matrix : $A^{-1}$ is defined such that $A^{-1}A = AA^{-1} = I$

$$A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&{-b}\\ -c&a \end{bmatrix}$$

Determinant : $det(A) = ad-bc$ determines whether $A$ is invertible
If $A$ is non invertible, $A \textbf{x} = \textbf{b}$ will have either no solution or infinitely many solutions.

2.2 Linear Combination

For given vectors $\textbf{v}_1,\textbf{v}_2, … \textbf{v}_p $ and given scalars $ c_1, c_2, … c_p $,

$ c_1\textbf{v}_1 + c_2 \textbf{v}_2, … + c_p \textbf{v}_p $ is called Linear Combination of vectors with weights $c$.

Span : Span{$\textbf{v}_1, … \textbf{v}_p$} is defined as the set of all linear combinations of $\textbf{v}_1, … \textbf{v}_p$. That is, Span is the collection of all vectors that can be written in the form $ c_1\textbf{v}_1 + c_2 \textbf{v}_2, … + c_p \textbf{v}_p $.
Geometric Description of Span : $ \textbf{v}_1 $ and $\textbf{v}_2$ are in $\mathbb{R}^3$ then Span is the plane in $\mathbb{R}^3$ that contains $ \textbf{v}_1 $ and $\textbf{v}_2$

Geometric Interpretation of Vector Equation : we can find whether the solution of vector equation exists using the knowledge of span. The solution of below vector equation exists only when $ \textbf{b} \in Span(\textbf{a}_1, \textbf{a}_2, \textbf{a}_3) $.

$$\textbf{a}_1 x_1 + \textbf{a}_2 x_2 + \textbf{a}_3 x_3 = b $$

Matrix Multiplications as Linear Combinations of Vectors :

$$ \begin{bmatrix} 60&1.7&1 \\ 65&1.6&0 \\ 55&1.8&1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 60\\ 65\\ 55\end{bmatrix} x_1 + \begin{bmatrix} 1.2 \\ 1.6 \\ 1.8 \end{bmatrix} x_2 + \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} x_3 $$

Matrix Multiplication as Column Combinations

$$ \begin{bmatrix} 1&1&0 \\ 1&0&1 \\ 1&-1&1 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\end{bmatrix}1 + \begin{bmatrix} 1\\ 0\\ -1\end{bmatrix}2 + \begin{bmatrix} 0\\ 1\\ 1\end{bmatrix}3 $$

Matrix Multiplication as Row Combinations

$$ \begin{bmatrix} 1&2&3 \end{bmatrix} \begin{bmatrix} 1&1&0 \\ 1&0&1 \\ 1&-1&1 \end{bmatrix} = 1 \times \begin{bmatrix} 1&1&0 \end{bmatrix} + 2\times \begin{bmatrix} 1& 0&1 \end{bmatrix} + 3 \times \begin{bmatrix} 1& -1& 1\end{bmatrix}$$

Matrix Multiplication as Sum of Outer Products

$$\begin{bmatrix} 1&1 \\ 1&-1 \\ 1&1 \end{bmatrix} \begin{bmatrix} 1& 2& 3 \\ 4&5&6 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\end{bmatrix} \begin{bmatrix} 1& 2& 3\end{bmatrix} + \begin{bmatrix} 1\\ -1\\ 1\end{bmatrix} \begin{bmatrix} 4& 5& 6\end{bmatrix} $$

2.3 Linear Independence

The solution for $A \textbf{x} = \textbf{b} $ is unique, when $\textbf{a}_1, \textbf{a}_2, \textbf{a}_3 $ are linearly independent
Infinitely many solutions exists when $\textbf{a}_1, \textbf{a}_2, \textbf{a}_3 $ are linearly dependent.

Linear Independence (practical) : Given a set of vectors $ \textbf {v}_1 , … , \textbf {v}_p $ , check if $ \textbf{v}_j$ can be represented as a linear combination of the previous vectors. If at least one such $ \textbf{v}_j$ is found then $ \textbf {v}_1 , … , \textbf {v}_p $ is linearly dependent.

Linear Independence (Formal) : Consider $ x_1 \textbf{v}_1 + … + x_p \textbf{v}_p = \textbf{0} $. Obviously, one solution is $\textbf{x} = [0…0]$, which we call a trivial solution. If this is the only solution $ \textbf {v}_1 , … , \textbf {v}_p $ is the only solution. if this system also has other nontrivial solutions, $ \textbf {v}_1 , … , \textbf {v}_p $ are linearly dependent.

2.4 Basis of a Subspace

Subspace : is defined as a subset of $\mathbb{R}^n$ closed under linear combination
A subspace is always represented as Span {$\mathbf{v}_1 , … , \mathbf{v}_p$}

Basis of a subspace : a set of vectors that satisfies both of the following

Fully spans the given subspace $H$
1. Linearly independent (i.e., no redundancy)

Non-Uniqueness of Basis : In the subspace $H$ (green plane), there other set of linearly independent vectors that span the subspace $H$

Dimension of Subspace : Even though different basis exist for $H$, the number of vectors in any basis for $H$ will be unique. This number is the dimension of $H$.
Column Space of Matrix : the subspace could be spanned by the column vector of $A$.
Rank of Matrix : the dimension of the column space of $A$

Details

$$ A = \begin{bmatrix} 1&0&2 \\ 0&1&1 \\ 1&0&2 \end{bmatrix}, \ \ \text{ Column Vector } a_1, a_2, a_3 = \begin{bmatrix} 1\\ 0\\ 1\end{bmatrix} , \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}, \begin{bmatrix} 2\\ 1\\ 2\end{bmatrix} $$

Column Space is subspace of column vectors from matrix $A$. And we need to find basis vector to get column space.
$a_1, a_2$ is independent each other. $a_3$ is represented as linear combination of $a_1, a_2$ . As a result, $a_3$ is dependent on $a_1, a_2$. So, $a_1, a_2$ become basis vector for colum space of $A$
Here, Rank of matrix $A$ is two. Because we have only two vectors $a_1, a_2$ that are linearly independent.
If a solution for $A \textbf{x} = \textbf{b} $ exists, then $b$ must be on the column space of $A$. Because, $b$ is linear combination of column vectors

$$ Ax = \begin{bmatrix} 1&0&2 \\ 0&1&1 \\ 1&0&2 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\end{bmatrix} = \begin{bmatrix} 1\\ 0\\ 1\end{bmatrix} x_1 + \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix} x_2 + \begin{bmatrix} 2\\ 1\\ 2\end{bmatrix} z_3 = \begin{bmatrix} b_1\\ b_2\\ b_3\end{bmatrix} $$

2.5 Chage of Basis

Generally, vectors are defined on cartesian coordinate , and it means they are based on standard basis (1,0), (0,1).
But sometimes, we need vectors on different coordinate (basis) for some reasons like computational efficiency. (e.g. eigendecomposition)
When the coordinate changes, the vector itself does not changed, but the representation of vector has been changed from (2,2) to (2,0) as in figure below

2.6 Linear Transformation

A transformation, function, or mapping maps an input $x$ to an output $y$
Domain : Set of all the possible values of $x$
Co-Domain : Set of all the possible values of $y$
Range : Set of all the output values mapped by each $x$ in the domain

Linear Transforamtion : A transformation (or mapping) $T$ is linear if : $$ T(c \mathbf{u}+ d \mathbf{v}) = cT (\mathbf{u}) + dT(\mathbf{v})$$
Matrix of Linear Transformation : In general, let $T$ : $ \mathbb{R}^n \rightarrow \mathbb{R}^m $ be a linear transformation. Then $T$ is always written as a matrix-vector multiplication. i.e., $T(\mathbf{x}) = A \mathbf{x}$
Here, the matrix $A$ is called the standad matrix of the linear transformation.

2.7 Linear Transformation in Neural Networks

Fully Connected Layers can be thought as a linear transformation
Fully Connected Layers usually involve a bias term, That’s why we call it an affine layer, not a linear Layers
Reference : colah’s blog

2.8 ONTO and ONE-TO-ONE

ONTO : A mapping $T$ : $\mathbb{R}^n \rightarrow \mathbb{R}^m$ is said to be onto $\mathbb{R}^m$ if each $\mathbf{b} \in \mathbb{R}^m$ is the image of at least one $x \in \mathbb{R}^n$. That is, the range is equal to the co-domain.
ONE-TO-ONE : A mapping $T$ : $\mathbb{R}^n \rightarrow \mathbb{R}^m$ is said to be one-to-one if each $\mathbf{b} \in \mathbb{R}^m$ is the image of at most one $x \in \mathbb{R}^n$. That is, each output vector in the range is mapped by only one input vector, no more than that.