Skip to content

Streaming

What Is Streaming?

Streaming is a way to make your agent feel more responsive. Instead of waiting for the complete response, you can stream intermediate results as they arrive.

Streaming Support

Railtracks supports streaming responses from your agent. To interact with a stream, just set the appropriate flag when creating your LLM.

import railtracks.llm as llm

model = llm.OpenAILLM(model_name="gpt-4o", stream=True)

When you call the LLM, it will return a generator that you can iterate through:

model = llm.OpenAILLM(model_name="gpt-4o", stream=True)

response = model.chat(llm.MessageHistory([
    llm.UserMessage("Tell me who you are are"),
]))

# The response object can act as an iterator returning string chunks terminating with the complete message.
for chunk in response:
    print(chunk)

Agent Support

Agents in Railtracks also support streamed responses. When creating your agent, you provide an LLM with streaming enabled:

import railtracks as rt

agent = rt.agent_node(
    llm=rt.llm.OpenAILLM(model_name="gpt-4o", stream=True),
)

The output of the agent will be a generator containing a sequence of strings, followed by the complete message.

Usage

agent = rt.agent_node(
    llm=rt.llm.OpenAILLM(model_name="gpt-4o", stream=True),
)

@rt.session
async def main():
    result = await rt.call(agent, rt.llm.MessageHistory([
        rt.llm.UserMessage("Tell me who you are are"),
        ]))

    # The response object can act as an iterator returning string chunks terminating with the complete message.

    for chunk in result:
        print(chunk)
`

Warning

When using streaming, you should fully exhaust the returned object within the session. If you do this outside of the session, the visualizer suite will not work as expected.

Warning

Streaming is not currently supported for tool-calling agents. See issue #756.