Part 3: Inside Atom Ai – How Generation Processes Enrich AI Conversations

Augmentation: The "A" in RAG

Augmentation refers to any transformation made to the context gathered in retrieval before it is sent to the large language model (LLM) for response generation. This can range from total transformation of the retrieved documents to no change at all. Conceptually, augmentation is necessary if anything about the data you pass as context to the LLM needs to be different than what is used and returned in retrieval.

Put simply, augmentation takes in content that is optimized for retrieval and outputs content that is optimized for use by the LLM in generating its response. There's some flexibility here, and the proper transformations may differ significantly across datasets and use cases.

For Atom Ai (formerly WWT GPT), our internally developed intelligent chatbot, we use two categories of transformations within our RAG pipeline.

Handling video transcripts through contextual compression

Contextual compression is a step in which an LLM is used to reduce the amount of unnecessary or irrelevant information in each document and to remove documents that are completely irrelevant. The idea here is to enable a larger number of documents within the context window, or essentially, a more diverse set of information for the LLM.

Contextual compression may not be suitable for all documents or for all use cases. Smaller documents or smaller chunks may present issues if contextually compressed as this may lead to too little information within the context, hindering the final response produced by the LLM. Adding contextual compression to a RAG pipeline introduces additional LLM calls and can severely slow it down. Considering this, compression is best used when you need to handle large documents which on their own would fill most of the LLM context window.

Figure 2: Contextual compression's role within the WWT GPT RAG pipeline. Augmentation is used to transform summaries of embedded video content returned by retrieval. The general summaries are replaced by a compressed version of the video transcript which is more targeted to the user's query. — **Figure 2**: Contextual compression's role within the Atom Ai RAG pipeline. Augmentation is used to transform summaries of embedded video content returned by retrieval. The general summaries are replaced by a compressed version of the video transcript that is more targeted to the user's query.

In Atom Ai, these large documents are present in the form of transcripts for the videos on wwt.com. These transcripts are handled through the following process:

At the time of indexing, an information-focused summary of the transcript is created and used for retrieval. The idea is that this summary should be more suitable for searching than the entire transcript.
If one of these transcript summaries is included in the retrieved context, the summary is replaced by the original transcript.
Contextual compression is used on the transcript to reduce the amount of text, only keeping pieces that are relevant to the user's question.

This process allows the chatbot to retain specific information about these large transcripts, such as speaker names or quotes, while ensuring the general content of the transcript is relevant to the user's question. This ability comes at the cost of a few seconds of added response latency due to the extra LLM calls.

Augmentation: The "A" in RAG

Handling video transcripts through contextual compression

Thanks for reading. Want to continue?