10. Dimensionality Reduction with PCA Introduction : Working directly with high-dimensional data comes with some difficulties It is hard to analyze, interpretation is difficult, visualization is nearly impossible, and storage of the data vectors can be expensive In this chapter, we derive PCA from first principles, drawing on our understanding of basis and basis change (Sections 2.6.1 and 2.7.2), projections (Section 3.8), eigenvalues (Section 4.2), Gaussian distributions (Section 6.5), and constrained optimization (Section 7.
9. Linear Regression In the following, we will apply the mathematical concepts from previous chapters, to solve linear regression (curve fitting) problems.
In regression, we aim to find a function $f$ that maps inputs $x∈R^D$ to corresponding function values $f(x)∈R.$
We are given a set of training inputs $x_n$ and corresponding noisy observations $y_n=f(x_n) + \epsilon$,
where $\epsilon$ is an i.i.d random variable that describes measurement and observation noise => simply zero-mean Gaussian noise (not further in this chapter) Then the task is to infer the function $f$ that generated the data and generalizes well to function values at new input locations.
8. When Models Meet Data In the first part of the book, we introduced the mathematics that form the foundations of many machine learning methods
The second part of the book introduces four pillars of machine learning:
Regression (Chapter 9) Dimensionality reduction (Chapter 10) Density estimation (Chapter 11) Classification (Chapter 12) 8.1 Data, Models, and Learning Three major components of a machine learning system: data, models, and learning. Good models : should perform well on unseen data.
6. Probability and Distributions Probability, loosely speaking, concerns the study of uncertainty Probability can be thought of as the fraction of times an event occurs, as a degree of belief about an event. We then would like to use this probability to measure the chance of something occurring in an experiment In ML, we often quantify uncertainty in the data, uncertainty in the machine learning model, and uncertainty in the predictions produced by the model.
1. Introduction and Motivation Machine learning is about designing algorithms that automatically extract valuable information from data. There are three concepts that are at the core of machine learning : data, a model, and learning. Data : Since machine learning is inherently data driven, data is at the core of machine learning. Model : would describe a function that maps inputs to real-valued outputs. Learning : can be understood as a way to automatically find patterns and structure in data by optimizing the parameters of the model 1.
5.1 Singular Value decomposition (SVD) singular value decomposition (SVD) : is a factorization of a real or complex matrix that generalizes the eigendecomposition (EVD) of a square normal matrix to any $m \times n$ matrix via an extension of the polar decomposition. $$ A = U \Sigma V^T$$ $A \in \mathbb{R}^{m \times n}$ : A given rectangular matrix $U \in \mathbb{R}^{m\times m} $ : matrices with orthonormal columns, providing an
4.0 Introduction Goal : We want to get a diagonalized matrix $D$ of a given matrix $A$ in the form of $ D = V^{-1}AV$ for some reasons such as computation resource. The above diagonalization process is also called eigendecomposition ($A = VDV^{-1}$) because we can find the followings from above equation, $VD=AV$ $D$ is a diagonal matrix with eigenvalues in diagonal entries $V$ is a matrix whose column vectors