Part 3: Inside Atom Ai – How Generation Processes Enrich AI Conversations
This is the third article in a series exploring the technical foundations that power Atom Ai (formerly WWT GPT), a GenAI-powered chatbot developed to increase employee productivity. It explores how different generation processes impact AI conversations.
Augmentation: The "A" in RAG
Augmentation refers to any transformation made to the context gathered in retrieval before it is sent to the large language model (LLM) for response generation. This can range from total transformation of the retrieved documents to no change at all. Conceptually, augmentation is necessary if anything about the data you pass as context to the LLM needs to be different than what is used and returned in retrieval.
Put simply, augmentation takes in content that is optimized for retrieval and outputs content that is optimized for use by the LLM in generating its response. There's some flexibility here, and the proper transformations may differ significantly across datasets and use cases.
For Atom Ai (formerly WWT GPT), our internally developed intelligent chatbot, we use two categories of transformations within our RAG pipeline.
Handling video transcripts through contextual compression
Contextual compression is a step in which an LLM is used to reduce the amount of unnecessary or irrelevant information in each document and to remove documents that are completely irrelevant. The idea here is to enable a larger number of documents within the context window, or essentially, a more diverse set of information for the LLM.
Contextual compression may not be suitable for all documents or for all use cases. Smaller documents or smaller chunks may present issues if contextually compressed as this may lead to too little information within the context, hindering the final response produced by the LLM. Adding contextual compression to a RAG pipeline introduces additional LLM calls and can severely slow it down. Considering this, compression is best used when you need to handle large documents which on their own would fill most of the LLM context window.
In Atom Ai, these large documents are present in the form of transcripts for the videos on wwt.com. These transcripts are handled through the following process:
- At the time of indexing, an information-focused summary of the transcript is created and used for retrieval. The idea is that this summary should be more suitable for searching than the entire transcript.
- If one of these transcript summaries is included in the retrieved context, the summary is replaced by the original transcript.
- Contextual compression is used on the transcript to reduce the amount of text, only keeping pieces that are relevant to the user's question.
This process allows the chatbot to retain specific information about these large transcripts, such as speaker names or quotes, while ensuring the general content of the transcript is relevant to the user's question. This ability comes at the cost of a few seconds of added response latency due to the extra LLM calls.
"WWT Research reports provide in-depth analysis of the latest technology and industry trends, solution comparisons and expert guidance for maturing your organization's capabilities. By logging in or creating a free account you’ll gain access to other reports as well as labs, events and other valuable content."
Thanks for reading. Want to continue?
Log in or create a free account to continue viewing Part 3: Inside Atom Ai – How Generation Processes Enrich AI Conversations and access other valuable content.