RAG Architecture
Build a Retrieval Augmented Generation (RAG) App: Part 1 | 🦜️🔗 LangChain
One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Ge
python.langchain.com
A typical RAG application has two main components:
1. Indexing
- Load: First we need to load our data. This is done with DocumentLoaders.
- Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
- Store: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.
2. Retrieval and generation
- Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
- Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data
Query - Document Retriever - Prompt engineering - LLM Inference
Example Code
from langchain_community.document_loaders import WebBaseLoader, PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.indexes.vectorstore import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
from langchain.text_splitter import CharacterTextSplitter
import os
load_dotenv('.env')
# ✅ Chat 모델을 사용할 땐 ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4-turbo-preview", temperature=0)
# 웹 문서 로드
loader = WebBaseLoader("https://happyiminjay.tistory.com/entry/RAG-Introduction")
documents = loader.load() # 웹에서 로드한 데이터
loader = PyPDFLoader("https://arxiv.org/pdf/1706.03762")
# To extract pdf images reference https://python.langchain.com/docs/how_to/document_loader_pdf/
pages = []
for page in loader.lazy_load():
pages.append(page)
print(pages)
print(documents)
# 인덱스 생성
embedding = OpenAIEmbeddings()
# text splitter
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=20)
if not os.path.exists("rag_index"):
# Split documents
texts = text_splitter.split_documents(documents)
# Create FAISS vectorstore
vectorstore = FAISS.from_documents(texts, embedding)
# Save vectorstore
vectorstore.save_local("rag_index")
else:
# Load vectorstore
vectorstore = FAISS.load_local("rag_index", embedding, allow_dangerous_deserialization = True)
# Query the vectorstore
query = "Search Engine의 종류"
docs = vectorstore.similarity_search(query)
result = llm.invoke(f"""Based on the following context, please answer the query: {query}
Context:
{docs}
""")
print(result.content)
Data Preparation
Retrieval Augmented Generation (RAG): What, Why and How? | LLMStack
Retrieval Augmented Generation (RAG) is a simple yet powerful approach that can be used to improve the performance of LLMs on a wide range of tasks.
llmstack.ai
Vector Store
Vector stores are useful for storing unstructured data like text, images, audio etc. and for searching the data based on semantic similarity. An embedding model is used to generate vector embeddings for the data we store in the database. Data will need to be chunked into smaller pieces depending on the type of data, use case and the embedding model. For example, if you are storing text data, you can chunk the data into sentences or paragraphs. If you are storing code, you can chunk the data into functions or classes. You may use smaller chunks if you choose to provide a wide range of snippets in context to the LLM. Once the data is chunked, you can generate embeddings for each chunk and store them in the vector store. When a query is made to the vector store, the query is also converted into an embedding and the vector store returns the most similar embeddings to the query.
Keyword Search
Keyword search is a simple approach to retrieving data where the data is indexed based on keywords and the search engine returns the documents that contain the keywords. Keyword search is useful for storing structured data like tables, documents etc. and for searching the data using keywords.
Graph Database
Graph databases store data in the form of nodes and edges. They are useful for storing structured data like tables, documents etc. and for searching the data using relationships between the data. For example, if you are storing data about people, you can create nodes for each person and edges between people who know each other. When a query is made to the graph database, the graph database returns the nodes that are connected to the query node. This kind of retrieval where the knowledge graphs are used is useful for tasks like question answering where the answer is a person or an entity.
Search Engine
1. Tavily Search
LLM Enhanced Web Search: The Tavily & Lang Chain
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
www.kaggle.com
How is Tavily different from other search APIs?
Current search APIs such as Google, Serp and Bing retrieve search results based on user query. However, the results are sometimes irrelevant to the goal of the search, and return simple site URLs and snippets of content which are not always relevant. Because of this, any developer would need to then scrape the sites for relevant content, filter irrelevant information, optimize the content to fit LLM context limits, and more. This tasks is a burden and requires skills to get right.
Tavily Search API aggregates over 20+ sites per a single API call, and uses AI to score, filter and rank the top most relevant sources and content to your task, query or goal. In addition, Tavily allows developers to add custom fields such as context and limit response tokens to enable the optimal search experience for LLMs. proprietary Lastly, Tavily indexes and ranks search results based on factors such as trusted sources, content quality, and more. This allows for a more accurate and relevant search experience for AI agents.
Remember: With LLM hallucinations, it's crucial to optimize for RAG with the right context and information
2. SerpAPI (Google/Bing/Yahoo Search)
SerpAPI | 🦜️🔗 LangChain
This notebook goes over how to use the SerpAPI component to search the web.
python.langchain.com
Serpapi Vs Tavily
Opinion from https://www.linkedin.com/posts/grace-li-562a22236_tavily-activity-7166051617773981698-EDvb/
- Serpapi
- Pros: Fast, reliable, provides structured output (title, link, source, date).
- Cons: Weak at handling content-based or complex keyword searches.
- Free Tier: 100 searches/month.
- Tavily
- Pros: Better at handling complex, content-based searches; returns detailed results with URLs and content.
- Cons: Occasionally experiences API call failures.
- Free Tier: 1000 searches/month.