Linear Algebra for ML #2 | Linear System & Linear Transform


2.1 Linear Equation and Linear System

  • Linear Equation is an equation that can be written in the form $$ a_1x_1 + …. a_nx_n = b $$
  • The above equation can be written as $ \textbf{a}^T \textbf{x} = b $.
  • Linear System is a collection of one or more linear equations
  • Identity Matrix : $I$ is a square matrix whose diagonal entries are all 1’s and all the other entries are zeros.

  • Inverse Matrix : $A^{-1}$ is defined such that $A^{-1}A = AA^{-1} = I$

$$A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&{-b}\\ -c&a \end{bmatrix}$$

  • Determinant : $det(A) = ad-bc$ determines whether $A$ is invertible

  • If $A$ is non invertible, $A \textbf{x} = \textbf{b}$ will have either no solution or infinitely many solutions.



2.2 Linear Combination

  • For given vectors $\textbf{v}_1,\textbf{v}_2, … \textbf{v}_p $ and given scalars $ c_1, c_2, … c_p $,

    $ c_1\textbf{v}_1 + c_2 \textbf{v}_2, … + c_p \textbf{v}_p $ is called Linear Combination of vectors with weights $c$.

  • Span : Span{$\textbf{v}_1, … \textbf{v}_p$} is defined as the set of all linear combinations of $\textbf{v}_1, … \textbf{v}_p$. That is, Span is the collection of all vectors that can be written in the form $ c_1\textbf{v}_1 + c_2 \textbf{v}_2, … + c_p \textbf{v}_p $.
  • Geometric Description of Span : $ \textbf{v}_1 $ and $\textbf{v}_2$ are in $\mathbb{R}^3$ then Span is the plane in $\mathbb{R}^3$ that contains $ \textbf{v}_1 $ and $\textbf{v}_2$

  • Geometric Interpretation of Vector Equation : we can find whether the solution of vector equation exists using the knowledge of span. The solution of below vector equation exists only when $ \textbf{b} \in Span(\textbf{a}_1, \textbf{a}_2, \textbf{a}_3) $.

$$\textbf{a}_1 x_1 + \textbf{a}_2 x_2 + \textbf{a}_3 x_3 = b $$


  • Matrix Multiplications as Linear Combinations of Vectors :

$$ \begin{bmatrix} 60&1.7&1 \\ 65&1.6&0 \\ 55&1.8&1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 60\\ 65\\ 55\end{bmatrix} x_1 + \begin{bmatrix} 1.2 \\ 1.6 \\ 1.8 \end{bmatrix} x_2 + \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} x_3 $$

  • Matrix Multiplication as Column Combinations

$$ \begin{bmatrix} 1&1&0 \\ 1&0&1 \\ 1&-1&1 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\end{bmatrix}1 + \begin{bmatrix} 1\\ 0\\ -1\end{bmatrix}2 + \begin{bmatrix} 0\\ 1\\ 1\end{bmatrix}3 $$

  • Matrix Multiplication as Row Combinations

$$ \begin{bmatrix} 1&2&3 \end{bmatrix} \begin{bmatrix} 1&1&0 \\ 1&0&1 \\ 1&-1&1 \end{bmatrix} = 1 \times \begin{bmatrix} 1&1&0 \end{bmatrix} + 2\times \begin{bmatrix} 1& 0&1 \end{bmatrix} + 3 \times \begin{bmatrix} 1& -1& 1\end{bmatrix}$$

  • Matrix Multiplication as Sum of Outer Products

$$\begin{bmatrix} 1&1 \\ 1&-1 \\ 1&1 \end{bmatrix} \begin{bmatrix} 1& 2& 3 \\ 4&5&6 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\end{bmatrix} \begin{bmatrix} 1& 2& 3\end{bmatrix} + \begin{bmatrix} 1\\ -1\\ 1\end{bmatrix} \begin{bmatrix} 4& 5& 6\end{bmatrix} $$



2.3 Linear Independence

  • The solution for $A \textbf{x} = \textbf{b} $ is unique, when $\textbf{a}_1, \textbf{a}_2, \textbf{a}_3 $ are linearly independent

  • Infinitely many solutions exists when $\textbf{a}_1, \textbf{a}_2, \textbf{a}_3 $ are linearly dependent.


  • Linear Independence (practical) : Given a set of vectors $ \textbf {v}_1 , … , \textbf {v}_p $ , check if $ \textbf{v}_j$ can be represented as a linear combination of the previous vectors. If at least one such $ \textbf{v}_j$ is found then $ \textbf {v}_1 , … , \textbf {v}_p $ is linearly dependent.

  • Linear Independence (Formal) : Consider $ x_1 \textbf{v}_1 + … + x_p \textbf{v}_p = \textbf{0} $. Obviously, one solution is $\textbf{x} = [0…0]$, which we call a trivial solution. If this is the only solution $ \textbf {v}_1 , … , \textbf {v}_p $ is the only solution. if this system also has other nontrivial solutions, $ \textbf {v}_1 , … , \textbf {v}_p $ are linearly dependent.


2.4 Basis of a Subspace

  • Subspace : is defined as a subset of $\mathbb{R}^n$ closed under linear combination

  • A subspace is always represented as Span {$\mathbf{v}_1 , … , \mathbf{v}_p$}


  • Basis of a subspace : a set of vectors that satisfies both of the following
  1. Fully spans the given subspace $H$

    1. Linearly independent (i.e., no redundancy)
  • Non-Uniqueness of Basis : In the subspace $H$ (green plane), there other set of linearly independent vectors that span the subspace $H$

  • Dimension of Subspace : Even though different basis exist for $H$, the number of vectors in any basis for $H$ will be unique. This number is the dimension of $H$.
  • Column Space of Matrix : the subspace could be spanned by the column vector of $A$.
  • Rank of Matrix : the dimension of the column space of $A$
Details

$$ A = \begin{bmatrix} 1&0&2 \\ 0&1&1 \\ 1&0&2 \end{bmatrix}, \ \ \text{ Column Vector } a_1, a_2, a_3 = \begin{bmatrix} 1\\ 0\\ 1\end{bmatrix} , \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}, \begin{bmatrix} 2\\ 1\\ 2\end{bmatrix} $$

  • Column Space is subspace of column vectors from matrix $A$. And we need to find basis vector to get column space.

  • $a_1, a_2$ is independent each other. $a_3$ is represented as linear combination of $a_1, a_2$ . As a result, $a_3$ is dependent on $a_1, a_2$. So, $a_1, a_2$ become basis vector for colum space of $A$

  • Here, Rank of matrix $A$ is two. Because we have only two vectors $a_1, a_2$ that are linearly independent.

  • If a solution for $A \textbf{x} = \textbf{b} $ exists, then $b$ must be on the column space of $A$. Because, $b$ is linear combination of column vectors

$$ Ax = \begin{bmatrix} 1&0&2 \\ 0&1&1 \\ 1&0&2 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\end{bmatrix} = \begin{bmatrix} 1\\ 0\\ 1\end{bmatrix} x_1 + \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix} x_2 + \begin{bmatrix} 2\\ 1\\ 2\end{bmatrix} z_3 = \begin{bmatrix} b_1\\ b_2\\ b_3\end{bmatrix} $$



2.5 Chage of Basis

  • Generally, vectors are defined on cartesian coordinate , and it means they are based on standard basis (1,0), (0,1).
  • But sometimes, we need vectors on different coordinate (basis) for some reasons like computational efficiency. (e.g. eigendecomposition)
  • When the coordinate changes, the vector itself does not changed, but the representation of vector has been changed from (2,2) to (2,0) as in figure below



2.6 Linear Transformation

  • A transformation, function, or mapping maps an input $x$ to an output $y$

  • Domain : Set of all the possible values of $x$

  • Co-Domain : Set of all the possible values of $y$

  • Range : Set of all the output values mapped by each $x$ in the domain


  • Linear Transforamtion : A transformation (or mapping) $T$ is linear if : $$ T(c \mathbf{u}+ d \mathbf{v}) = cT (\mathbf{u}) + dT(\mathbf{v})$$

  • Matrix of Linear Transformation : In general, let $T$ : $ \mathbb{R}^n \rightarrow \mathbb{R}^m $ be a linear transformation. Then $T$ is always written as a matrix-vector multiplication. i.e., $T(\mathbf{x}) = A \mathbf{x}$

  • Here, the matrix $A$ is called the standad matrix of the linear transformation.



2.7 Linear Transformation in Neural Networks

  • Fully Connected Layers can be thought as a linear transformation

  • Fully Connected Layers usually involve a bias term, That’s why we call it an affine layer, not a linear Layers

  • Reference : colah’s blog



2.8 ONTO and ONE-TO-ONE

  • ONTO : A mapping $T$ : $\mathbb{R}^n \rightarrow \mathbb{R}^m$ is said to be onto $\mathbb{R}^m$ if each $\mathbf{b} \in \mathbb{R}^m$ is the image of at least one $x \in \mathbb{R}^n$. That is, the range is equal to the co-domain.

  • ONE-TO-ONE : A mapping $T$ : $\mathbb{R}^n \rightarrow \mathbb{R}^m$ is said to be one-to-one if each $\mathbf{b} \in \mathbb{R}^m$ is the image of at most one $x \in \mathbb{R}^n$. That is, each output vector in the range is mapped by only one input vector, no more than that.