Mathematics for ML #1 | Introduction Part.I

2022-01-18 3. Mathematics for ML Comments

1. Introduction and Motivation

Machine learning is about designing algorithms that automatically extract valuable information from data.
There are three concepts that are at the core of machine learning : data, a model, and learning.
- Data : Since machine learning is inherently data driven, data is at the core of machine learning.
- Model : would describe a function that maps inputs to real-valued outputs.
- Learning : can be understood as a way to automatically find patterns and structure in data by optimizing the parameters of the model

1.1 Finding Words for Intuitions

Data as vectors : there are (at least) three different ways to think about vectors:
- a vector as an array of numbers (computer science view),
- a vector as an arrow with a direction and magnitude (physics view),
- a vector as an object that obeys addition and scaling (a mathematical view)
Model : A good model can be used to predict what would happen in the real world without performing real-world experiments.
Learning : We learn from available data by using numerical optimization methods with the aim that the model performs well on unseen data.

1.2 Two Ways to Read This Book

Bottom-up : Building up the concepts from foundational to more ad-vanced.
Top-down : Drilling down from practical needs to more basic requirements
Contents
- Part I is about Mathematics :
  - linear algebra : The study of vectors and matrices
  - analytic geometry : the construction of similarity and distances
  - matrix decomposition : Some operations on matrices are extremely useful in ML
  - probability theory : Quantification of uncertainty
  - vector calculus : details concept of gradients
  - optimization : to find maxima/minima of functions
- Part II is about Machine Learning :
  - linear regression ; to find functions that map inputs $x$ to corresponding observed function values $y$, model fitting via MLE and MAP.
  - dimensionality reduction : to find a compact, lower-dimensional representation of high-dimensional data $x$.
  - density estimation : to find a probability distribution that describes a given dataset. We will focus on Gaussian mixture models for this purpose, and we will discuss an iterative scheme to find the parameters of this model.
  - classification : unlike regression, where the labels were real-valued, the labels in classification are integers, which requires special care.

Appendix.

Key Sections
- Section 8.3, Parameter Estimation
  - Maximum Likelihood Estimation and Maximum a Posteriori
- Section 8.4 Probabilistic Modeling and Inference
  - Bayesian Inference and Generative Process
- Section 9.2 Parameter Estimation
  - Maximum Likelihood Estimation and Maximum a Posteriori