Run vs RunMany¶
An experiment: how much faster is run_many() compared to calling run() in
a loop? This recipe runs both approaches on the same input and compares
wall time and token usage.
Run It¶
python -m cookbook optimization/run-vs-run-many \
--input cookbook/data/demo/text-medium/input.txt --mock
Real API:
python -m cookbook optimization/run-vs-run-many \
--input path/to/file.pdf --no-mock --provider gemini --model gemini-2.5-flash-lite
What You'll See¶
Sequential run() loop (3 prompts):
Wall time: 4.2s | Tokens: 3,450
Batched run_many() (3 prompts):
Wall time: 1.8s | Tokens: 3,420
Answers: 3 / 3
Speedup: 2.3x
In real mode, run_many() is typically faster because it shares uploads and
runs prompts concurrently. In --mock mode the speedup is flat (no real
network cost) — that's expected.
Tuning¶
- Keep prompt count small (3-8) while iterating on quality.
- Use shorter prompts while measuring overhead.
- If answers are empty or generic, tighten prompt constraints before scaling.
Next Steps¶
For many files, use Broadcast Process Files. For throughput tuning, see Large-Scale Fan-Out.