LLM versions are frozen in time.
When the model is released, it only contains information up to the time/data it was trained on.
When new information or context is needed, using just the model without RAG will cause model to hallucinate or return a response that it doesn't know.
RAG allows us to pass a new context and data to the existing model.
Injecting external context into its prompt at runtime.
Instead of using only the model's pretrained knowledge.
RAG is about retrieval from connected data sources.
Purpose is to generate more accurate and context-aware response.
Semantic Search is the method to find relevant information across uploaded files.
Keyword search looks for exact word match.
Semantic search finds conceptually similar content - even if the exact terms don't match.
Semantic search is accomplished using a vector database.
LLM converts user question to a vector and compares it to stored vectors.
Chunking
Embedding
Storage
Querying
Response generation
Rag has very low limitation bar.