Chunking & Namespace

1. Chunking

We can Select Chunk size through simple experiments.

Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex — LlamaIndex - Build Knowledge Assistants over your Enterpri

LlamaIndex is a simple, flexible framework for building knowledge assistants using LLMs connected to your enterprise data.

www.llamaindex.ai

import nest_asyncio

nest_asyncio.apply()

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.evaluation import (
    DatasetGenerator,
    FaithfulnessEvaluator,
    RelevancyEvaluator
)
from llama_index.llms import OpenAI

import openai
import time

openai.api_key = 'OPENAI-API-KEY'

# Download Data
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

# Load Data
reader = SimpleDirectoryReader("./data/10k/")
documents = reader.load_data()

# To evaluate for each chunk size, we will first generate a set of 40 questions from first 20 pages.
eval_documents = documents[:20]
data_generator = DatasetGenerator.from_documents(eval_documents)
eval_questions = data_generator.generate_questions_from_nodes(num = 20)

# We will use GPT-4 for evaluating the responses
gpt4 = OpenAI(temperature=0, model="gpt-4")

# Define service context for GPT-4 for evaluation
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)

# Define Faithfulness and Relevancy Evaluators which are based on GPT-4
faithfulness_gpt4 = FaithfulnessEvaluator(service_context=service_context_gpt4)
relevancy_gpt4 = RelevancyEvaluator(service_context=service_context_gpt4)

# Define function to calculate average response time, average faithfulness and average relevancy metrics for given chunk size
def evaluate_response_time_and_accuracy(chunk_size):
    total_response_time = 0
    total_faithfulness = 0
    total_relevancy = 0

    # create vector index
    llm = OpenAI(model="gpt-3.5-turbo")
    
    # https://docs.llamaindex.ai/en/v0.9.48/module_guides/supporting_modules/service_context.html
    # from ServiceContect we can set llm, embed_model, text_splitter
    service_context = ServiceContext.from_defaults(llm=llm, chunk_size=chunk_size)
    vector_index = VectorStoreIndex.from_documents(
        eval_documents, service_context=service_context
    )

    query_engine = vector_index.as_query_engine()
    num_questions = len(eval_questions)

    for question in eval_questions:
        start_time = time.time()
        response_vector = query_engine.query(question)
        elapsed_time = time.time() - start_time
        
        faithfulness_result = faithfulness_gpt4.evaluate_response(
            response=response_vector
        ).passing
        
        relevancy_result = relevancy_gpt4.evaluate_response(
            query=question, response=response_vector
        ).passing

        total_response_time += elapsed_time
        total_faithfulness += faithfulness_result
        total_relevancy += relevancy_result

    average_response_time = total_response_time / num_questions
    average_faithfulness = total_faithfulness / num_questions
    average_relevancy = total_relevancy / num_questions

    return average_response_time, average_faithfulness, average_relevancy

# Iterate over different chunk sizes to evaluate the metrics to help fix the chunk size.
for chunk_size in [128, 256, 512, 1024, 2048]
  avg_time, avg_faithfulness, avg_relevancy = evaluate_response_time_and_accuracy(chunk_size)
  print(f"Chunk size {chunk_size} - Average Response time: {avg_time:.2f}s, Average Faithfulness: {avg_faithfulness:.2f}, Average Relevancy: {avg_relevancy:.2f}")

1.1 Chunking Methods

a. Fixed-size Chunking :

computationally cheap, save processing power, easy to use.
Use overlap

b. "Context-aware" Chunking

Sentence Splitting
- Many models are optimized for embedding sentence-level context.
- Librarys : NLTK, spaCy
Recursive Chunking
- Recursive splitting: It tries to split the text using a hierarchy of separators (e.g., paragraph breaks, sentences, words) in a recursive way.
- Intelligent chunking: The goal is to make chunks that are under a certain size (in characters or tokens) while preserving as much semantic meaning as possible.
- Overlap support: It can create overlapping chunks to preserve context between segments.
Specialized Chunking : for structured and formatted content
- from langchain.text_splitter import MarkdownTextSplitter
- from langchain.text_splitter import LatexTextSplitter

c. Multi-Modal Chunking

2. Namespaces

Understanding indexes - Pinecone Docs

You may get fewer than top_k results if top_k is larger than the number of sparse vectors in your index that match your query. That is, any vectors where the dotproduct score is 0 will be discarded.

docs.pinecone.io

Within an index, records are partitioned into namespaces, and all upserts, queries, and other data operations always target one namespace.

This has two main benefits:

Multitenancy: When you need to isolate data between customers, you can use one namespace per customer and target each customer’s writes and queries to their dedicated namespace. See Implement multitenancy for end-to-end guidance.
Faster queries: When you divide records into namespaces in a logical way, you speed up queries by ensuring only relevant records are scanned. The same applies to fetching records, listing record IDs, and other data operations.

Practical Advantages of Namespace

Efficiency
- Faster vector search because you can narrow the search space just by specifying namespace.
Simplicity
- You manage only one index.
- No need to create, monitor, backup, or version many different indexes.
Cost Saving
- Especially in Pinecone, each index requires baseline server resources.
  (e.g., each index eats a minimum RAM/storage, even if empty.)
- Using namespaces avoids this unnecessary overhead.
Faster Development
- Easier to batch upserts, queries, deletions because you're always talking to the same index.
Multi-Tenancy
- If you have a multi-user application (ex: chatbot for multiple companies), you can isolate each company’s data inside different namespaces without spinning up new indexes.

When NOT to use namespace:

If you truly need different dimensions (e.g., some embeddings are 512-dim, others are 768-dim).
If you need full isolation for compliance (e.g., legally mandated separation).
If indexes are so big that even one namespace becomes huge (then sharding across indexes may be better).

Implementation

For Chroma use collection_name

https://docs.trychroma.com/docs/overview/getting-started

Chroma Docs

Documentation for ChromaDB

docs.trychroma.com

https://wikidocs.net/234094 > VectorStore 생성

01. Chroma

.custom { background-color: #008d8d; color: white; padding: 0.25em 0.5…

wikidocs.net

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

CSEE, AI, CV 매일 매일 조금씩 성장하는 블로그