Skip to content

ToolUseEvaluator

The ToolUseEvaluator assesses past agent runs and reports per-tool invocation counts, failure rates, and runtimes.

Usage

from railtracks import evaluations as evals

data = evals.extract_agent_data_points(".railtracks/data/sessions/")

evaluator = evals.ToolUseEvaluator()
results = evals.evaluate(data=data, evaluators=[evaluator])

Metrics Tracked

Metric Description
UsageCount Number of times each tool was called per agent run.
FailureRate Fraction of calls that failed (0.0–1.0) per agent run.
Runtime Wall-clock execution time per individual tool call (seconds).