Retrieval-Augmented Generation (RAG)
Need your AI agents to access your company's knowledge base, docs, or any private data? RAG is your answer! It enables grounded answering by retrieving relevant snippets from your documents and composing them into LLM prompts.
RAG ingests documents, chunks them, embeds the chunks into vectors, stores those vectors, and retrieves the most relevant snippets at query time—all automatically handled for you.
Two Ways to Get Started
We offer two approaches to integrate RAG into your application:
Choose Your Adventure
- Quick & Easy: Use the prebuilt
rag_node
for instant setup - Full Control: Build a custom RAG node using the
RAG
class for maximum flexibility
Option 1: Prebuilt RAG Node (Recommended)
Perfect for getting started quickly! Wrap your RAG index into a callable node so other nodes and LLMs can retrieve relevant context and compose prompts.
import railtracks as rt
from railtracks.prebuilt import rag_node
from railtracks.llm import OpenAILLM
# 1) Build the retrieval node
retriever = rag_node([
"Our company policy requires all employees to work from home on Fridays",
"Data security guidelines mandate encryption of all sensitive customer information",
"Employee handbook states vacation requests need 2 weeks advance notice"
])
# 2) Create Agent
agent = rt.agent_node(
llm=OpenAILLM("gpt-4o"),
)
# 3) Run the agent.
@rt.session()
async def main():
question = "What is the work from home policy?"
search_result = await rt.call(retriever, question, top_k=2)
context = "\n\n".join(search_result.to_list_of_texts())
response = await rt.call(
agent,
user_input=(
"Based on the following context, please answer the question.\n"
"Context:\n"
f"{context}\n"
"Question:\n"
f"{question}\n"
"Answer based only on the context provided."
"If the answer is not in the context, say \"I don't know\"."
)
)
File Loading Made Easy
The rag_node
function accepts raw text content only. For file loading, read the files first:
from railtracks.rag.utils import read_file
# Read file contents manually
try:
doc1_content = read_file("./docs/faq.txt")
doc2_content = read_file("./docs/policies.txt")
except FileNotFoundError:
doc1_content = "FAQ file not found. Please ensure docs/faq.txt exists."
doc2_content = "Policies file not found. Please ensure docs/policies.txt exists."
# Build retriever with file contents
retriever = rag_node([
doc1_content,
doc2_content
])
Option 2: Custom RAG Node (Advanced)
For maximum control and customization, build your own RAG node.
import railtracks as rt
from railtracks.rag.rag_core import RAG, RAGConfig, SearchResult
rag_core = RAG(
docs=["<Your text here>", "..."],
config=RAGConfig(
embedding={"model": "text-embedding-3-small"},
store={},
chunking={
"chunk_size": 1000,
"chunk_overlap": 200,
"model": "gpt-4o",
},
)
)
rag_core.embed_all()
@rt.function_node
async def custom_rag_node(query: str) -> SearchResult:
"""A custom RAG function node that retrieves documents based on a query."""
return rag_core.search(query, top_k=5)
Pro Tips
- The callable node accepts
query
and optionaltop_k
to control number of retrieved chunks. SearchResult
can be converted to plain text using.to_list_of_texts()
- You can inspect the object for similarity scores and metadata
Chunking Strategy
Best Practices:
chunk_size
: Number of tokens per chunk (approximate, based ontoken_count_model
)chunk_overlap
: Number of tokens to overlap between adjacent chunks- Sweet spot: Start with 600-1200 tokens with 10-20% overlap
Embeddings
Model Selection:
"text-embedding-3-small"
is a good default for many use cases (balance of quality and cost)- Upgrade to stronger models for nuanced or specialized domains
- Configure via
embed_config
Vector Store Options
Storage Recommendations:
- In-memory by default (perfect for development and tests)
- For larger corpora: Consider FAISS/Qdrant or other backends supported by
create_store
- Production: Use persistent storage for better performance
Top-k Retrieval
Finding the Right Balance:
- Typical values: 3–5 chunks
- Increase if your content is highly fragmented or diverse
- Monitor token usage - larger chunk sizes and higher
top_k
values increase memory and token consumption
Related Documentation
Features & Concepts
External Libraries
Powered By:
- LiteLLM - Embeddings and chat transport
Optional Vector Store Backends: