LM #4 | Language Model Enhancing

2024-04-13 3. Natural Language Comments

Summary

Overview : Techniques that improve the performance of large language models
- RAG (Retrieval Augmented Generation)
  - Improved factual accuracy: The model can refer to real documents rather than relying solely on its pretraining.
  - Smaller models can perform better: Because the knowledge can be retrieved, the model doesn’t need to memorize everything.
  - Domain adaptation: You can add specialized knowledge (e.g., company manuals, legal docs) without retraining the LLM.
  - Reduced hallucinations: More grounded responses since they’re based on actual sources.

1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020)

Summary : RAG literally (1) retrieves the top-k documents for a given query, (2) augments the latent vector by concatenating the original input x with the retrieved passage z, and (3) generates the final output.
Abstract
- Background. LLMs have been shown to store factual knowledge in their parameters, and achieve SOTA results when fine-tuned on down-stream NLP tasks.
- Problem Def. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures.
- Contrib. We explore a general-purpose fine-tuning recipe for RAG ? models which combine pre-trained parametric and non-parametric memory for language generation.
Method : We explore RAG models, which use the input sequence $x$ to retrieve text documents $z$ and use them as additional context when generating the target sequence $y$.
- Retrieval (DPR) : returns top-K distributions over text passages given a query $x$
  - To train the retriever and generator end-to-end, we treat the retrieved document as a latent variable.
- Generator $p_\theta(y_i | x, z, y_{1:i-1})$ : generates a current token based on a context of the previous tokens $y_{1:i-1}$ , the original input $x$ and a retrived passage $z$
  - could be modelled using any encoder-decoder.
  - To combine the input $x$ with the retrieved content $z$ when generating from BART, we simply concatenate them.

RAG overview

2. RAPTOR: Recursive Abstractive Processing for Tree Organized Retrieval

Abstract
- background. Retrieval-augmented language models can better adapt to changes in world state incorporate long-tail knowledge.
- Problem Def. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic undestanding of the overall document context.
- Contrib. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up
Method : design an indexing and retrieval system that uses a tree structure - RAPTOR
- Indexing : Recursive Abstractive Processing (Embedding, Clustering and Summarization)
  - Preparation : Segmenting the retrieval corpus into short, contiguous texts of length 100, similar to traditional retrieval augmentation techniques.
  - Embedding : These texts are then embedded using SBERT,
  - Clustering : clusters chunks of text, organizaing text segments into cohesice groups.
  - Summarization : generates text summaries of those clusters with GPT3.5, and then repeats, generating a tree from the bottom up.
- Retrieval (Querying) : Collapsed tree method, offering unique ways of traversing the multi-layered RAPTOR tree to retrieve relevant information
  - First, collapse the entire RAPTOR tree into a single layer.
  - Next, calculate the cosine similarity between the query embedding and the embeddings of all nodes present in the collapsed set.
  - Pick the top-k nodes that have the highest cosine similarity scores with the query.
  - Finally, add retrieved context with query and put it into LLM

RAPTOR overview