Skip to content

Image Input

Inputting the images to your agents is quite simple and follows the same logic as text with a few minor differences. There are currently three ways to supply an agent with an image:

  1. Path to a local image file
  2. Path to a public image online
  3. Byte64 encoded data of an image.

Below you can find all three methods in play. Simply provide the attachment variable to your UserMessage.

attachment accetable values

The attachment can be a str or a list[str]

message_history = rt.llm.MessageHistory(
    [
        rt.llm.UserMessage(
            content="What is the image below showing?",
            attachment="https://cdn.britannica.com/39/226539-050-D21D7721/Portrait-of-a-cat-with-whiskers-visible.jpg",
        )
    ]
)
message_history = rt.llm.MessageHistory(
    [
        rt.llm.UserMessage(
            content="What is the image below showing?",
            # attachment="path/to/local/image.jpg", # Uncomment and provide a valid local file path
        )
    ]
)
import base64

# 1. Read and encode image
with open("random.png", "rb") as f:
    img_bytes = f.read()
    img_b64 = base64.b64encode(img_bytes).decode("utf-8")

# 2. Create message history with attachment
message_history = rt.llm.MessageHistory([
    rt.llm.UserMessage(
        content="What is the image below showing?",
        attachment=img_b64
    )
])
message_history = rt.llm.MessageHistory(
    [
        rt.llm.UserMessage(
            content="What is the image below showing?",
            attachment=[
                "https://cdn.britannica.com/39/226539-050-D21D7721/Portrait-of-a-cat-with-whiskers-visible.jpg",
                ""
                ],
        )
    ]
)

The rest of the invocation (tool_calling, structured_output, etc) will remain the same.

File Types

Supported file types will correspond to the file types supported by the underlying LLM used.

Image Output

We're currently not natively supporting outputing images. You can however wrap any image generation logic within a tool and provide your agent with that tool's specifications to achieve this behaviour.