Configuring Pollux¶
You need to tell Pollux which provider, model, and API key to use. The
Config object captures these choices explicitly. No global state, no
implicit defaults.
Boundary
Pollux owns: validating config, resolving API keys from the environment, managing retry logic, and enforcing provider constraints.
You own: selecting the right provider and model, managing API keys securely, and tuning concurrency and retry settings for your workload.
Config Fields¶
from pollux import Config
config = Config(
provider="gemini",
model="gemini-2.5-flash-lite",
)
All fields and their defaults:
| Field | Type | Default | Description |
|---|---|---|---|
provider |
"gemini" \| "openai" \| "anthropic" \| "openrouter" |
(required) | Provider to use |
model |
str |
(required) | Model identifier |
api_key |
str \| None |
None |
Explicit key; auto-resolved from env if omitted |
use_mock |
bool |
False |
Use mock provider (no network calls) |
request_concurrency |
int |
6 |
Max concurrent API calls in multi-prompt execution |
retry |
RetryPolicy |
RetryPolicy() |
Retry configuration |
API Key Resolution¶
If api_key is omitted, Pollux resolves it from environment variables:
- Gemini:
GEMINI_API_KEY - OpenAI:
OPENAI_API_KEY - Anthropic:
ANTHROPIC_API_KEY - OpenRouter:
OPENROUTER_API_KEY
export GEMINI_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export OPENROUTER_API_KEY="your-key"
You can also pass a key directly:
config = Config(provider="openai", model="gpt-5-nano", api_key="sk-...")
Pollux auto-loads .env files via python-dotenv. Create a .env in your
project root for local development, but never commit it.
Mock Mode¶
Use use_mock=True for local development without external API calls:
config = Config(provider="gemini", model="gemini-2.5-flash-lite", use_mock=True)
Mock responses echo prompts and return synthetic metrics. Useful for validating pipeline logic, testing integrations, and CI.
Performance and Cost Controls¶
config = Config(
provider="gemini",
model="gemini-2.5-flash-lite",
request_concurrency=6, # Concurrent API calls
)
| Need | Direction |
|---|---|
| Fast iteration without API calls | use_mock=True |
| Reduce token spend on repeated context | Use create_cache(). See Reducing Costs with Context Caching |
| Higher throughput for many prompts/sources | Increase request_concurrency |
| Better resilience to transient failures | Customize retry=RetryPolicy(...) |
RetryPolicy¶
Pollux retries transient provider failures with exponential backoff, full jitter, and a wall-clock deadline:
from pollux import RetryPolicy
config = Config(
provider="gemini",
model="gemini-2.5-flash-lite",
retry=RetryPolicy(max_attempts=3, initial_delay_s=0.25),
)
All RetryPolicy fields (defaults shown):
| Field | Default | Description |
|---|---|---|
max_attempts |
2 |
Total attempts (including the first call) |
initial_delay_s |
0.5 |
Delay before the first retry |
backoff_multiplier |
2.0 |
Multiplier applied after each retry |
max_delay_s |
5.0 |
Upper bound on any single retry delay |
jitter |
True |
Full jitter to avoid thundering herd |
max_elapsed_s |
15.0 |
Wall-clock deadline; None to disable |
When a provider returns a Retry-After hint, Pollux respects it (whichever
is longer: the computed backoff or the server hint).
Options¶
Config establishes the infrastructure requirements for a provider connection. Options is different: it controls per-prompt inference overrides. This split lets you tune how text is generated on a call-by-call basis without tearing down or recreating the underlying client.
from pollux import Options
options = Options(
system_instruction="You are a concise analyst.", # Optional global behavior guide
temperature=0.7, # Generation tuning
top_p=0.9, # Generation tuning
tools=[{"name": "get_weather"}], # Native tool calling
tool_choice="auto", # Tool calling mode ('auto', 'required', 'none', or dict)
response_schema=MyPydanticModel, # Structured output extraction
reasoning_effort="medium", # Controls model thinking depth
max_tokens=4096, # Provider-specific output cap
implicit_caching=True, # Auto-cache prefix (Anthropic only)
)
| Field | Type | Default | Description |
|---|---|---|---|
system_instruction |
str \| None |
None |
Global system prompt |
temperature |
float \| None |
None |
Sampling temperature |
top_p |
float \| None |
None |
Nucleus sampling probability |
tools |
list[dict] \| None |
None |
JSON schemas for native tools. See Continuing Conversations Across Turns |
tool_choice |
str \| dict \| None |
None |
Tool execution strategy. See Building an Agent Loop |
response_schema |
type[BaseModel] \| dict |
None |
Expected JSON response format. See Extracting Structured Data |
reasoning_effort |
str \| None |
None |
Controls model thinking depth. See Writing Portable Code Across Providers |
max_tokens |
int \| None |
None |
Output-token cap with provider-specific semantics. Anthropic applies provider defaults when omitted; other providers may ignore it. See Provider Capabilities |
history |
list[dict] \| None |
None |
Conversation history. See Continuing Conversations Across Turns |
continue_from |
ResultEnvelope \| None |
None |
Resume from a prior result. See Continuing Conversations Across Turns |
cache |
CacheHandle \| None |
None |
Persistent explicit cache (Gemini). See Reducing Costs with Context Caching |
implicit_caching |
bool \| None |
None |
Enable or disable Anthropic implicit caching. Defaults to True for a single provider call and False for multi-call fan-out. implicit_caching=True raises on providers that do not support it. See Reducing Costs with Context Caching |
Note
OpenAI GPT-5 family models (gpt-5, gpt-5-mini, gpt-5-nano) reject
sampling controls like temperature and top_p with provider errors.
Older OpenAI models (for example gpt-4.1-nano) still accept them.
See Writing Portable Code Across Providers
for the full constraints mapping.
Note
max_tokens is not a portable "length knob" with identical behavior
everywhere. Anthropic uses it as the total output budget and applies a
provider default when you omit it. OpenRouter forwards it to the routed
model. Other providers may ignore it in the current release.
Note
Options.delivery_mode remains available only as a compatibility shim for
older code. New code should omit it.
Deferred work uses defer() / defer_many(). Legacy
delivery_mode="deferred" values still raise ConfigurationError, with
guidance that depends on which entry point you called.
Cache handle restrictions
When cache is set, system_instruction, tools, and tool_choice
must not be passed in the same Options. system_instruction and
tools can be baked into create_cache(), while tool_choice must be
set only on uncached calls. See
Reducing Costs with Context Caching for details.
Safety Notes¶
Configis immutable (frozen=True). Create a new instance to change values.Configvalidates the provider name on init. Unknown providers raiseConfigurationErrorimmediately rather than failing at call time.- String representation redacts API keys.
- Missing keys in real mode raise
ConfigurationErrorwith actionable hints.
Dev Install (Contributors)¶
See Contributing for full development setup instructions.
For the full provider feature matrix and model-specific constraints, see Provider Capabilities and Writing Portable Code Across Providers. For the deferred workflow, see Submitting Work for Later Collection.