Budget protection
Budget protection prevents unexpected costs — your agent automatically stops before spending more than you allow. Budgets are measured in tokens (the units AI providers use for billing). Comis enforces three budget windows: Per execution (default: 2,000,000 tokens) — The maximum tokens for a single agent run. This is like a spending cap per task. If your agent hits this limit during a single conversation turn, it stops and lets you know. Per hour (default: 10,000,000 tokens) — A rolling hourly limit. This prevents a burst of activity from running up your bill. Even if each individual execution is within budget, the hourly cap prevents a flood of requests from adding up. Per day (default: 100,000,000 tokens) — Your absolute daily spending cap. This is the last line of defense against unexpected costs from sustained heavy usage.What do these numbers mean for your bill?
Token costs vary by provider and model, but here are approximate ranges to give you a sense of scale:- 2,000,000 tokens (the default per-execution limit) costs roughly **3/million input tokens. GPT-4o is about $2.50/million input tokens. Exact prices vary — check your provider’s pricing page.
- 10,000,000 tokens (the default hourly limit) costs roughly $15-50 per hour at sustained maximum usage. In practice, most agents use far less.
- 100,000,000 tokens (the default daily limit) represents a hard ceiling of roughly $150-500 per day, which would require sustained heavy usage to reach.
These are approximate ranges based on current pricing. Token costs vary by
provider, model, and whether tokens are input or output. Check your
provider’s pricing page for exact numbers.
When prompt caching is active, cached tokens cost 50-90% less than regular
input tokens depending on your provider. This means your actual spending is
often lower than raw token counts suggest. Comis tracks these savings
automatically — see Observability for cache
savings details.
What happens when a budget is exceeded
When your agent reaches a budget limit:- The current execution stops gracefully.
- The agent sends a message explaining it hit a spending limit.
- The limit resets when the time window passes (hourly or daily) or automatically on the next execution (for per-execution limits).
Circuit breaker
The circuit breaker works like a fuse in your electrical panel. If your agent hits too many errors in a row, the circuit breaker trips and pauses the agent. This prevents cascading failures — one bad API response will not cause your agent to retry endlessly and rack up costs. The circuit breaker has three states: Closed (normal operation) — Everything works normally. Your agent processes messages and calls its AI provider as usual. Behind the scenes, Comis counts consecutive errors. As long as requests succeed, the error count resets to zero. Open (tripped) — After too many consecutive errors (default: 5 in a row), the circuit breaker trips. The agent stops making API calls entirely. No tokens are used, no costs accumulate. This is the “fuse has blown” state. Half-open (testing) — After a cooldown period (default: 60 seconds), the circuit breaker lets the agent try one request. If it succeeds, the breaker closes and normal operation resumes. If it fails, the breaker opens again and waits for another cooldown period. This pattern means a temporary outage at your AI provider causes a brief pause, not an avalanche of failed requests. Once the provider recovers, your agent automatically resumes.Step limit
Your agent has a maximum number of reasoning steps per execution (default: 150). Each step is one cycle of the agent thinking and optionally using a tool. The step limit prevents infinite loops where the agent keeps thinking and using tools without making progress. If the agent reaches the limit, it stops and delivers whatever response it has so far. For most conversations, agents finish well within 150 steps — typically 3-10 steps for a normal response.Context window guard
The context window guard monitors how full the agent’s working memory is. Every AI model has a maximum amount of text it can process at once, and approaching that limit can cause problems. The guard has two thresholds:- Warn (default: 80%) — When the context window is 80% full, the agent receives a warning. It can still work but should avoid adding more context.
- Block (default: 95%) — At 95% full, the guard stops the agent from making further API calls. This prevents out-of-memory errors and garbled responses that happen when the context overflows.
Tool-output sanitization
Every tool result that comes back from a built-in or MCP tool is run throughsanitizeToolOutput()
(packages/agent/src/safety/tool-output-safety.ts) before it touches the
agent’s context:
- NFKC normalisation — collapses Unicode trickery (e.g. lookalike characters) to a canonical form.
- Invisible-character stripping — removes zero-width spaces, soft hyphens, and other non-printable runs that prompt-injection payloads use to hide.
- Per-tool size limits — large outputs are truncated up front and the full result is offloaded to disk via the microcompaction path (see Cache).
- Image sanitiser — image attachments returned by tools go through a separate pipeline that re-encodes the file and strips EXIF metadata.
Action classifier and broken follow-through
After the model finishes a reply, Comis runs two safety checks before the response leaves the agent:- Broken follow-through detection
(
detectBrokenFollowThrough()inpackages/agent/src/safety/response-safety-checks.ts) — flags responses where the agent claims to have done something it never actually did (e.g. “I have sent the email” when nomessagetool was called). - Post-compaction safety reinject — when context compaction has just fired, a short reminder of the agent’s current task is re-injected so the model does not silently lose the thread.
SDK retry
When your AI provider returns a temporary error (like “too many requests” or a server error), Comis automatically retries the request a few times before giving up. This handles brief network glitches and rate limits without requiring any action from you. The retry uses exponential backoff — it waits a short time before the first retry and progressively longer before subsequent retries, up to a maximum delay. This is gentler on the provider and more likely to succeed than immediate retries.Configuration
| Option | Type | Default | What it does |
|---|---|---|---|
budgets.perExecution | number | 2000000 | Maximum tokens per single agent run |
budgets.perHour | number | 10000000 | Maximum tokens per hour (rolling window) |
budgets.perDay | number | 100000000 | Maximum tokens per day (rolling window) |
circuitBreaker.failureThreshold | number | 5 | Consecutive failures before the breaker trips |
circuitBreaker.resetTimeoutMs | number | 60000 | Cooldown before testing recovery (1 minute) |
circuitBreaker.halfOpenTimeoutMs | number | 30000 | Timeout for the half-open test request |
maxSteps | number | 150 | Maximum reasoning steps per execution |
contextGuard.enabled | boolean | true | Enable context window monitoring |
contextGuard.warnPercent | number | 80 | Warn when context window is this percent full |
contextGuard.blockPercent | number | 95 | Block execution when context window is this percent full |
sdkRetry.enabled | boolean | true | Enable automatic retry for temporary errors |
sdkRetry.maxRetries | number | 5 | Maximum retry attempts |
~/.comis/config.yaml
Models
Configure AI providers and automatic failover.
Sessions
How conversations are managed and when they reset.
Resilience
Timeout guards, provider health monitoring, and dead-letter queue.
