NLP #6 | Text Summarization

Summary

  • Text Summarization : is a technique to shorten long texts such that the summary has all the important points of the actual document.

    • By Summarization Approache

      1. Extraction-based Summarization: The extractive approach involves picking up the most important phrases and lines from the documents. It then combines all the important lines to create the summary. So, in this case, every line and word of the summary actually belongs to the original document which is summarized.

      2. Abstraction-based Summarization: The abstractive approach involves summarization based on deep learning. So, it uses new phrases and terms, different from the actual document, keeping the points the same, just like how we actually summarize. So, it is much harder than the extractive approach.

    • By Number of source documents

      1. Single Document Summary: Summary of a Single Document
      2. Multi-Document Summary: Summary from multiple documents

Terminologies

  • gold summary, reference summary : Ground Truth

Reference



1. TextRank: Bringing order into texts (2004)

  • introduction : Traditional Representative for extractive approach before DL based methods,

  • Method : Similar to Page-Rank algorithm

    • Page-rank : is the most representative graph ranking algorithm. It’s also famous as the ranking algorithm of early Google’s search engine. Assume that pages with many backlinks are important and works like some kind of voting.
  • References



2. BERT Sum : Text Summarization with Pretrained Encoders (2019)

  • Introduction : propose a novel document level summerizer based on BERT

  • Method : Finetuning BERT For Summarization

    • Summarization Encoder : In order to represent individual sentences, we insert external [CLS] tokens at the start of each sentence, and each [CLS] symbol collects features for the sentence preceding it.
    • Extractive Summarization : can be defined as the binary classification task of assigning a label to each $sent_i$ indicating whether the sentence should be included in the summary.
      • With BERT-SUM, vector $t_i$ which is the vector of the $i$-th [CLS] symbol from the top layer can be used as the representation for $sent_i$.
      • Several inter-sentence Transformer layers are then stacked on top of BERT outputs, to capture document-level features for extracting summaries.
    • Abstractive Summarization : use a standard encoder-decoder framework for abstractive summarization.
      • The encoder is the pretrained BERT-SUM and the decoder is a 6-layered Transformer initialized randomly.
  • References