Fusion of 𝗥𝗔𝗚 retrieval augmented generation and 𝗖𝗔𝗚 cache augmented generation

Image gallery for: Fusion of 𝗥𝗔𝗚 retrieval augmented generation and 𝗖𝗔𝗚 cache augmented generation

Fusion of 𝗥𝗔𝗚 (Retrieval Augmented Generation) and 𝗖𝗔𝗚 (Cache Augmented Generation).

Fusion of 𝗥𝗔𝗚 (Retrieval Augmented Generation) and 𝗖𝗔𝗚 (Cache Augmented Generation). How can you benefit from it as AI Engineer? Few months ago there was a lot of hype around a technique called CAG. While it is powerful to its own extent, the real magic happens when you combine CAG with regular RAG. Let’s see what it would look like and what additional considerations should be taken into account. Here are example steps to implement CAG + RAG architecture: 𝘋𝘢𝘵𝘢 𝘗𝘳𝘦𝘱𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨: 𝟭. We use only rarely changing data sources for Cache Augmented Generation. On top of the requirement of data changing rarely we should also think about which of the sources are often hit by relevant queries. Once we have this information, only then we pre-compute all of this selected data into a KV Cache of the LLM. Cache it in memory. This only needs to be done once, the following steps can be run multiple times without recomputing the initial cache. 𝟮. For RAG, if necessary, precompute and store vector embeddings in a compatible database to be searched later in step 4. Sometimes simpler data types are enough for RAG, a regular database might suffice. 𝘘𝘶𝘦𝘳𝘺 𝘗𝘢𝘵𝘩: We can now utilise the preprocessed data. 𝟯. Compose a prompt including user query and the system prompt with instructions on how cached context and retrieved external context should be used by the LLM. 𝟰. Embed a user query to be used for semantic search via vector DBs and query the context store to retrieve relevant data. If semantic search is not required, query other sources, like real time databases or web. 𝟱. Enrich the final prompt with external context retrieved in step 4. 𝟲. Return the final answer to the user. 𝘚𝘰𝘮𝘦 𝘊𝘰𝘯𝘴𝘪𝘥𝘦𝘳𝘢𝘵𝘪𝘰𝘯𝘴: ➡️ Context window is not infinite and even while some models boast enormous context window sizes, the needle in the haystack problem has not yet been solved so use available context wisely and cache only the data you really need. ✅ For some business cases, specific datasets are extremely valuable to be passed to the model as cache. Think about an assistant that has to always comply with a lengthy set of internal rules stored in multiple documents. ✅ While CAG has been popularised for Open Source just recently, it is already viable for some time via Prompt Caching features in OpenAI and Anthropic APIs. It is really easy to start prototyping there. ✅ You should always separate hot and cold data sources, only use cold (data that changes rarely) in your cache, otherwise the data will go stale and the application will go out of sync. ❌ Be very careful about what you cache as the data will be available for all users to query. ❌ It is very hard to ensure RBAC for cached data unless you have a separate model with its own cache per role. Have you used the fusion of CAG and RAG already? Let me know about your results in the comments 👇 #LLM #AI #MachineLearning | 44 comments on LinkedIn
Advertisement
Academia
Work
Business Ideas
IT
Advertisement
Belajar
ai
Alex Wang on LinkedIn: #genai #llm #technology #artificialintelligence

Language Models
AI
Tech
Practical- computer & tech
Deep seek prompt for education

Coaching
Advertisement
Advertisement
Advertisement
IA

Image gallery for: Fusion of 𝗥𝗔𝗚 retrieval augmented generation and 𝗖𝗔𝗚 cache augmented generation

Fusion of 𝗥𝗔𝗚 (Retrieval Augmented Generation) and 𝗖𝗔𝗚 (Cache Augmented Generation).

Alex Wang on LinkedIn: #genai #llm #technology #artificialintelligence

Deep seek prompt for education