Skip to content

Run vs RunMany

An experiment: how much faster is run_many() compared to calling run() in a loop? This recipe runs both approaches on the same input and compares wall time and token usage.

Run It

python -m cookbook optimization/run-vs-run-many \
  --input cookbook/data/demo/text-medium/input.txt --mock

Real API:

python -m cookbook optimization/run-vs-run-many \
  --input path/to/file.pdf --no-mock --provider gemini --model gemini-2.5-flash-lite

What You'll See

Sequential run() loop (3 prompts):
  Wall time: 4.2s | Tokens: 3,450

Batched run_many() (3 prompts):
  Wall time: 1.8s | Tokens: 3,420
  Answers: 3 / 3

Speedup: 2.3x

In real mode, run_many() is typically faster because it shares uploads and runs prompts concurrently. In --mock mode the speedup is flat (no real network cost) — that's expected.

Tuning

  • Keep prompt count small (3-8) while iterating on quality.
  • Use shorter prompts while measuring overhead.
  • If answers are empty or generic, tighten prompt constraints before scaling.

Next Steps

For many files, use Broadcast Process Files. For throughput tuning, see Large-Scale Fan-Out.