Skip to content

Retrieval-Augmented Generation (RAG)

Need your AI agents to access your company's knowledge base, docs, or any private data? RAG is your answer! It enables grounded answering by retrieving relevant snippets from your documents and composing them into LLM prompts.

RAG ingests documents, chunks them, embeds the chunks into vectors, stores those vectors, and retrieves the most relevant snippets at query time—all automatically handled for you.


Two Ways to Get Started

We offer two approaches to integrate RAG into your application:

Choose Your Adventure

  1. Quick & Easy: Use the prebuilt rag_node for instant setup
  2. Full Control: Build a custom RAG node using the RAG class for maximum flexibility

Perfect for getting started quickly! Wrap your RAG index into a callable node so other nodes and LLMs can retrieve relevant context and compose prompts.

import railtracks as rt
from railtracks.prebuilt import rag_node
from railtracks.llm import OpenAILLM

# 1) Build the retrieval node
retriever = rag_node([
    "Our company policy requires all employees to work from home on Fridays",
    "Data security guidelines mandate encryption of all sensitive customer information",
    "Employee handbook states vacation requests need 2 weeks advance notice"
])

# 2) Create Agent
agent = rt.agent_node(
    llm=OpenAILLM("gpt-4o"),
)

# 3) Run the agent.
@rt.session()
async def main():
    question = "What is the work from home policy?"
    search_result = await rt.call(retriever, question, top_k=2)
    context = "\n\n".join(search_result.to_list_of_texts())

    response = await rt.call(
        agent,
        user_input=(
            "Based on the following context, please answer the question.\n"
            "Context:\n"
            f"{context}\n"
            "Question:\n"
            f"{question}\n"
            "Answer based only on the context provided."
            "If the answer is not in the context, say \"I don't know\"."
        )
    )

File Loading Made Easy

The rag_node function accepts raw text content only. For file loading, read the files first:

from railtracks.rag.utils import read_file

# Read file contents manually
try:
    doc1_content = read_file("./docs/faq.txt")
    doc2_content = read_file("./docs/policies.txt")
except FileNotFoundError:
    doc1_content = "FAQ file not found. Please ensure docs/faq.txt exists."
    doc2_content = "Policies file not found. Please ensure docs/policies.txt exists."

# Build retriever with file contents
retriever = rag_node([
    doc1_content,
    doc2_content
])

Option 2: Custom RAG Node (Advanced)

For maximum control and customization, build your own RAG node.

import railtracks as rt
from railtracks.rag.rag_core import RAG, RAGConfig, SearchResult

rag_core = RAG(
        docs=["<Your text here>", "..."],
        config=RAGConfig(
            embedding={"model": "text-embedding-3-small"},
            store={},
            chunking={
                "chunk_size": 1000,
                "chunk_overlap": 200,
                "model": "gpt-4o",
            },
        )
    )
rag_core.embed_all()

@rt.function_node
async def custom_rag_node(query: str) -> SearchResult:
    """A custom RAG function node that retrieves documents based on a query."""
    return rag_core.search(query, top_k=5)

Pro Tips

  • The callable node accepts query and optional top_k to control number of retrieved chunks.
  • SearchResult can be converted to plain text using .to_list_of_texts()
  • You can inspect the object for similarity scores and metadata

Chunking Strategy

Best Practices:

  • chunk_size: Number of tokens per chunk (approximate, based on token_count_model)
  • chunk_overlap: Number of tokens to overlap between adjacent chunks
  • Sweet spot: Start with 600-1200 tokens with 10-20% overlap

Embeddings

Model Selection:

  • "text-embedding-3-small" is a good default for many use cases (balance of quality and cost)
  • Upgrade to stronger models for nuanced or specialized domains
  • Configure via embed_config

Vector Store Options

Storage Recommendations:

  • In-memory by default (perfect for development and tests)
  • For larger corpora: Consider FAISS/Qdrant or other backends supported by create_store
  • Production: Use persistent storage for better performance

Top-k Retrieval

Finding the Right Balance:

  • Typical values: 3–5 chunks
  • Increase if your content is highly fragmented or diverse
  • Monitor token usage - larger chunk sizes and higher top_k values increase memory and token consumption

Features & Concepts

External Libraries

Powered By:

  • LiteLLM - Embeddings and chat transport

Optional Vector Store Backends:

  • FAISS - Fast similarity search
  • Qdrant - Vector database