Mathematics for ML #1 | Introduction Part.I


1. Introduction and Motivation

  • Machine learning is about designing algorithms that automatically extract valuable information from data.
  • There are three concepts that are at the core of machine learning : data, a model, and learning.
    • Data : Since machine learning is inherently data driven, data is at the core of machine learning.
    • Model : would describe a function that maps inputs to real-valued outputs.
    • Learning : can be understood as a way to automatically find patterns and structure in data by optimizing the parameters of the model


1.1 Finding Words for Intuitions

  • Data as vectors : there are (at least) three different ways to think about vectors:
    • a vector as an array of numbers (computer science view),
    • a vector as an arrow with a direction and magnitude (physics view),
    • a vector as an object that obeys addition and scaling (a mathematical view)
  • Model : A good model can be used to predict what would happen in the real world without performing real-world experiments.
  • Learning : We learn from available data by using numerical optimization methods with the aim that the model performs well on unseen data.


1.2 Two Ways to Read This Book

  • Bottom-up : Building up the concepts from foundational to more ad-vanced.

  • Top-down : Drilling down from practical needs to more basic requirements

  • Contents

    • Part I is about Mathematics :

      • linear algebra : The study of vectors and matrices
      • analytic geometry : the construction of similarity and distances
      • matrix decomposition : Some operations on matrices are extremely useful in ML
      • probability theory : Quantification of uncertainty
      • vector calculus : details concept of gradients
      • optimization : to find maxima/minima of functions
    • Part II is about Machine Learning :

      • linear regression ; to find functions that map inputs $x$ to corresponding observed function values $y$, model fitting via MLE and MAP.
      • dimensionality reduction : to find a compact, lower-dimensional representation of high-dimensional data $x$.
      • density estimation : to find a probability distribution that describes a given dataset. We will focus on Gaussian mixture models for this purpose, and we will discuss an iterative scheme to find the parameters of this model.
      • classification : unlike regression, where the labels were real-valued, the labels in classification are integers, which requires special care.


Appendix.

  • Key Sections

    • Section 8.3, Parameter Estimation

      • Maximum Likelihood Estimation and Maximum a Posteriori
    • Section 8.4 Probabilistic Modeling and Inference

      • Bayesian Inference and Generative Process
    • Section 9.2 Parameter Estimation

      • Maximum Likelihood Estimation and Maximum a Posteriori