LM #4 | Language Model Enhancing


Summary

  • Overview : Techniques that improve the performance of large language models
    • RAG (Retrieval Augmented Generation)
      • Improved factual accuracy: The model can refer to real documents rather than relying solely on its pretraining.
      • Smaller models can perform better: Because the knowledge can be retrieved, the model doesn’t need to memorize everything.
      • Domain adaptation: You can add specialized knowledge (e.g., company manuals, legal docs) without retraining the LLM.
      • Reduced hallucinations: More grounded responses since they’re based on actual sources.


1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020)

  • Summary : RAG literally (1) retrieves the top-k documents for a given query, (2) augments the latent vector by concatenating the original input x with the retrieved passage z, and (3) generates the final output.
  • Abstract
    • Background. LLMs have been shown to store factual knowledge in their parameters, and achieve SOTA results when fine-tuned on down-stream NLP tasks.
    • Problem Def. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures.
    • Contrib. We explore a general-purpose fine-tuning recipe for RAG ? models which combine pre-trained parametric and non-parametric memory for language generation.
  • Method : We explore RAG models, which use the input sequence $x$ to retrieve text documents $z$ and use them as additional context when generating the target sequence $y$.
    • Retrieval (DPR) : returns top-K distributions over text passages given a query $x$
      • To train the retriever and generator end-to-end, we treat the retrieved document as a latent variable.
    • Generator $p_\theta(y_i | x, z, y_{1:i-1})$ : generates a current token based on a context of the previous tokens $y_{1:i-1}$ , the original input $x$ and a retrived passage $z$
      • could be modelled using any encoder-decoder.
      • To combine the input $x$ with the retrieved content $z$ when generating from BART, we simply concatenate them.

RAG overview



2. RAPTOR: Recursive Abstractive Processing for Tree Organized Retrieval

  • Abstract
    • background. Retrieval-augmented language models can better adapt to changes in world state incorporate long-tail knowledge.
    • Problem Def. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic undestanding of the overall document context.
    • Contrib. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up
  • Method : design an indexing and retrieval system that uses a tree structure - RAPTOR
    • Indexing : Recursive Abstractive Processing (Embedding, Clustering and Summarization)
      • Preparation : Segmenting the retrieval corpus into short, contiguous texts of length 100, similar to traditional retrieval augmentation techniques.
      • Embedding : These texts are then embedded using SBERT,
      • Clustering : clusters chunks of text, organizaing text segments into cohesice groups.
      • Summarization : generates text summaries of those clusters with GPT3.5, and then repeats, generating a tree from the bottom up.
    • Retrieval (Querying) : Collapsed tree method, offering unique ways of traversing the multi-layered RAPTOR tree to retrieve relevant information
      • First, collapse the entire RAPTOR tree into a single layer.
      • Next, calculate the cosine similarity between the query embedding and the embeddings of all nodes present in the collapsed set.
      • Pick the top-k nodes that have the highest cosine similarity scores with the query.
      • Finally, add retrieved context with query and put it into LLM

RAPTOR overview