1. Introduction and Motivation
- Machine learning is about designing algorithms that automatically extract valuable information from data.
- There are three concepts that are at the core of machine learning : data, a model, and learning.
- Data : Since machine learning is inherently data driven, data is at the core of machine learning.
- Model : would describe a function that maps inputs to real-valued outputs.
- Learning : can be understood as a way to automatically find patterns and structure in data by optimizing the parameters of the model
1.1 Finding Words for Intuitions
- Data as vectors : there are (at least) three different ways to think about vectors:
- a vector as an array of numbers (computer science view),
- a vector as an arrow with a direction and magnitude (physics view),
- a vector as an object that obeys addition and scaling (a mathematical view)
- Model : A good model can be used to predict what would happen in the real world without performing real-world experiments.
- Learning : We learn from available data by using numerical optimization methods with the aim that the model performs well on unseen data.
1.2 Two Ways to Read This Book
-
Bottom-up : Building up the concepts from foundational to more ad-vanced.
-
Top-down : Drilling down from practical needs to more basic requirements
-
Contents
-
Part I is about Mathematics :
- linear algebra : The study of vectors and matrices
- analytic geometry : the construction of similarity and distances
- matrix decomposition : Some operations on matrices are extremely useful in ML
- probability theory : Quantification of uncertainty
- vector calculus : details concept of gradients
- optimization : to find maxima/minima of functions
-
Part II is about Machine Learning :
- linear regression ; to find functions that map inputs $x$ to corresponding observed function values $y$, model fitting via MLE and MAP.
- dimensionality reduction : to find a compact, lower-dimensional representation of high-dimensional data $x$.
- density estimation : to find a probability distribution that describes a given dataset. We will focus on Gaussian mixture models for this purpose, and we will discuss an iterative scheme to find the parameters of this model.
- classification : unlike regression, where the labels were real-valued, the labels in classification are integers, which requires special care.
-
Appendix.
-
Key Sections
-
Section 8.3, Parameter Estimation
- Maximum Likelihood Estimation and Maximum a Posteriori
-
Section 8.4 Probabilistic Modeling and Inference
- Bayesian Inference and Generative Process
-
Section 9.2 Parameter Estimation
- Maximum Likelihood Estimation and Maximum a Posteriori
-