ML Basic #7 | Sequential Modeling

0. Introduction

  • one to one : Vanilla mode of processing without RNN, from fixed sized input to fixed-sized output
  • one to many : Sequence output (e.g. image captioning)
  • many to one : Sequence input (e.g. sentiment analysis)
  • many to many : Sequence input and sequence output (e.g. Machine Translation)

Figure 1. Overview of Sequence Processing


1. Recurrent Neural Networks

  • Introduction : Imagine you want to classify what kind of action is happening at every point in a movie. It’s unclear how traditional neural nets could use its reasoning about previous events in the film to predict later ones.

  • Method : RNN address this issue. They are networks with loops in them, allowing information to persist.

  • Limitations : Basic RNN design struggles with long-term dependencies (i.e. longer data sequences, paragrph-level rather than sentence-level)

Figure 1. (Above) A standard RNN contains a single layer


2. Long Short Term Memory (1997)

  • Introduction : LSTMs are explicitly designed to deal with long-term dependency problem.

  • Methods: Internally, LSTMs run with Cell state, which can be thought of as the main stream, with three gates that controlls Cell state (forget gate, input gate, output gate).

    1. Forget Gate : The first step in our LSTM is to decide what infromation we’re going to throw away from the cell state. This decision is made by a sigmoid layer called forget gate.

      $$ G_f(h_{t-1}, x_t) = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$$

    2. Input Gate : The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called “input gate” decides which values we’ll update. Next, a tanh layer creates a vector of new candadate values $C_t$, that could be added to the state

      $$G_i(h_{t-1},x) = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$

      $$\tilde{C} = tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$$

    3. Update Cell State: It’s time to update the old cell state, $C_{t-1}$, into the new cell state $C_t$.

      $$C_t = f_t * C_{t-1} + i_t * \tilde{C_t} $$

    4. Output Gate : Finall, we need to decide what information we’re going to return. This output will be based on our cell state, but will be a filtered version

      $$ G_o(h_{t-1}, x) = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) $$

      $$ h_t = G_o * tanh(C_t)$$

Figure 2. An LSTM contains four interacting layers

Figure 3. Step1: decide what information to thow away (L), Step2&3: decide what to store (M), Step4: decide what to output (R)


3. Seq2Seq (2014)

  • Introduction : DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. And this is a significant limitation for sequential problems such as speech recognition and machine translation.

  • Methods : use one LSTM to read the input sequence, and use another LSTM to extract the output sequence

    1. Encoder : tokenized input sentences are fed into encoder that result in context vector

    2. Context Vector

    3. Decoder : Basically same with RNN Language Model

References