Complete reference for all configuration options with types, defaults, and validation rules
What this is for: the single source of truth for everything you can put in ~/.comis/config.yaml. Who it’s for: anyone tuning agent behavior, wiring channels, hardening the gateway, or composing layered configs across environments.Comis uses a single config.yaml file (or multiple layered files) to configure every aspect of the system. The configuration contains 41 top-level sections with 54+ nested schemas defining agent behavior, channel adapters, security policies, gateway settings, and more. All schemas use Zod strict validation — unknown keys are rejected at startup.Config files are specified via the COMIS_CONFIG_PATHS environment variable (colon-separated, like the shell PATH). See the Configuration Guide for a step-by-step setup walkthrough.
Reference another YAML file for modular composition (max 10 levels deep)
$include: ./channels.yaml
${VAR_NAME}
Substitute an environment variable or stored secret via SecretManager
token: "${TELEGRAM_BOT_TOKEN}"
$${VAR_NAME}
Escape syntax — produces literal ${VAR_NAME} in the output
example: "$${NOT_SUBSTITUTED}"
$VAR_NAME
Bare reference — auto-corrected to ${VAR_NAME} with a warning
token: $MY_TOKEN (corrected)
Strict validation (z.strictObject) means any unrecognized key in your config file causes a startup error. Check key names carefully against this reference.
A realistic single-tenant deployment running one Telegram-connected research assistant with persistent memory, gateway auth, scheduled heartbeat, and approval gates for sensitive actions. Copy this as a starting point and trim anything you don’t need — defaults take care of the rest.
# ~/.comis/config.yaml — annotated reference deploymenttenantId: research-lab # Used to scope memory and sessionslogLevel: info # trace/debug/info/warn/error/fataldataDir: "" # Empty resolves to ~/.comis# -- One agent: research assistant on Anthropic Sonnet 4.5 --------------------agents: research-bot: name: "Research Bot" provider: anthropic model: claude-sonnet-4-5-20250929 maxSteps: 50 # Reasoning steps per execution cacheRetention: long # Anthropic prompt cache: long beats short budgets: perExecution: 1000000 # 1M tokens per single run perHour: 5000000 perDay: 50000000 rag: enabled: true # Auto-retrieve relevant memory before LLM call maxResults: 8 minScore: 0.15 skills: toolPolicy: profile: full # minimal | coding | messaging | supervisor | full deny: ["exec"] # No shell exec for this agent session: resetPolicy: mode: hybrid # daily + idle whichever fires first dailyResetHour: 4 idleTimeoutMs: 14400000 # 4 hours scheduler: heartbeat: enabled: true # Proactive check-ins intervalMs: 1800000 # 30 min# -- One inbound channel ------------------------------------------------------channels: telegram: enabled: true botToken: "${TELEGRAM_BOT_TOKEN}" allowFrom: ["123456789"] # Only this Telegram user can reach the agent mediaProcessing: transcribeAudio: true analyzeImages: true# -- Memory: SQLite + local embeddings ---------------------------------------memory: dbPath: "memory.db" walMode: true embeddingModel: "text-embedding-3-small" embeddingDimensions: 1536 retention: maxAgeDays: 365 # Keep one year of memory entries (0 = no age limit)# -- Gateway: HTTP + WebSocket on loopback only ------------------------------gateway: enabled: true host: "127.0.0.1" # 0.0.0.0 only with TLS in front port: 4766 tokens: - id: cli-default secret: "${COMIS_GATEWAY_TOKEN}" scopes: ["*"] rateLimit: windowMs: 60000 maxRequests: 200# -- Approvals: gate destructive actions -------------------------------------approvals: enabled: true defaultPolicy: prompt # prompt | allow | deny rules: - match: { tool: "exec", argMatches: ["rm ", "DROP ", "delete from"] } action: deny - match: { tool: "exec" } action: prompt# -- Scheduler: cron + heartbeat ---------------------------------------------scheduler: cron: enabled: true defaultTimezone: "America/New_York" heartbeat: enabled: true intervalMs: 600000 # 10 min global heartbeat coalescer
The values shown are real (not placeholders). Save it as ~/.comis/config.yaml, populate ~/.comis/.env with TELEGRAM_BOT_TOKEN and COMIS_GATEWAY_TOKEN, and comis daemon start will boot a working deployment.
Multi-agent configuration map. Each key is an agent ID, and the value is a PerAgentConfig object that extends AgentConfig with skills, scheduler, session, concurrency, and broadcast settings.Type:Record<string, PerAgentConfig>Default:{ default: PerAgentConfig.parse({}) } (one default agent with all defaults)
Anthropic prompt cache retention: none, short, long
maxContextChars
number
100000
Maximum total characters for context window (~25k tokens)
maxToolResultChars
number
50000
Maximum characters per tool result before truncation. Small/nano-class models default lower (small 12000, nano 8000) so a single large result can’t fill the window; set this explicitly (or use capabilityClassOverride: frontier) to keep the full 50000.
preserveRecent
number
4
Minimum recent messages to always preserve during compaction
workspacePath
string
(unset)
Path to agent workspace directory containing identity files
reactionLevel
enum
(unset)
Reaction frequency: minimal (1 per 5-10 exchanges) or extensive (react freely)
language
string
(unset)
Reply language for deterministic degraded replies (context-exhausted / output-starved notices). Accepts a BCP-47 tag ("he") or an English display name ("Hebrew"). When omitted, Comis auto-detects the reply language from the USER.md preferred language, then the inbound message script (Hebrew, Arabic, and Russian/Cyrillic only). Does not affect the live agent reply, which always follows the user’s language.
enforceFinalTag
boolean
false
When enabled, only content inside <final> blocks reaches users. Suppresses all content outside final tags on both streaming and non-streaming paths.
fastMode
boolean
false
Enables fast mode for the LLM provider (provider-specific behavior).
storeCompletions
boolean
false
When enabled, sends store: true to OpenAI-compatible providers for completion storage.
oauthProfiles
Record<string, string>
{}
Per-provider OAuth profile preference. Keys are provider ids; values are profile ids in <provider>:<identity> form (e.g., openai-codex:user@example.com). The daemon resolves at LLM-call time as: this map -> lastGood per provider -> first available profile in the store. See OAuth concepts -> Multi-account profiles.
Multi-account example — two agents, two ChatGPT accounts:
Operation Model Overrides (agents.*.operationModels)
Override the model and timeout for specific internal operation types. Each entry is an OperationModelEntry with optional model (string "provider:modelId") and timeout (number) fields. The key is timeout (milliseconds) — timeoutMs is rejected by the strict config parser. Omit any type to use the agent’s primary model.
Operation type
When it fires
Recommended model
cron
Scheduled cron task execution
—
heartbeat
Periodic heartbeat check
—
subagent
Sub-agent task delegation
—
compaction
Context compaction summarization
Capable model or contextEngine.compaction.strongerSummarizerModel
taskExtraction
Task extraction from conversation
—
condensation
Memory condensation
Capable model
verification
Pre-delivery critic (R4). Fires when agents.<id>.verification.enabled=true and the response is a completion-claiming response meeting minResponseChars.
Cheap model: "anthropic:claude-haiku-4-5-20250929" or "ollama:qwen3.6:27b" for local self-check. Omit to use the primary model.
planning
Pre-execution planner (R5, deferrable on M2).
Capable model for checklist generation. Omit to use the primary model.
Stall budget for primary prompt calls (3 minutes): the deadline resets on activity — stream text/thinking deltas (throttled ~1/s) and tool completions. A turn dies only when NO activity occurs for this long. Stall semantics apply to all providers.
retryPromptTimeoutMs
number
60000
Wall-clock timeout for retry and fallback prompt calls (1 minute). Used during auth rotation and model fallback attempts. Whole-turn (not stall-based) — retry and fallback prompts do not reset on activity.
stallCeilingMultiplier
number
10
Makespan ceiling: a turn is aborted at promptTimeoutMs × stallCeilingMultiplier even while still streaming — bounds runaway generations that would otherwise reset the stall budget forever. Valid range 1–100 (a value below 1 would fire the ceiling before the stall budget; the product is additionally capped at Node’s ~24.8-day timer limit). Runtime-tunable.
See Resilience for how prompt timeouts fit into the full resilience stack.
Enable automatic memory retrieval before LLM calls
maxResults
number
5
Maximum memory results to retrieve
maxContextChars
number
4000
Maximum characters of memory context injected
minScore
number
0.1
Minimum RRF score threshold (0-1)
includeTrustLevels
string[]
["system", "learned"]
Trust levels to include in retrieval
baseFloor
number (0–1)
0 for frontier/mid; 0.15 for small/nano (capability-gated)
Minimum BASE relevance score (pre-boost) for memory injection. Boosts cannot resurrect a memory whose base score falls below this threshold. Gates on ScoreBreakdown.base (un-boosted cosine/RRF score), applied after scoreWithBreakdown(). Setting baseFloor: 0 is the unset sentinel — the capability default applies (0.15 for small/nano). Any value greater than 0 is treated as explicit and wins over the capability default.
Security (S6): A weaker ModelProfile cannot lower this below the operator-set value. FROZEN_TRUST_PATHS and MemoryWriteValidator remain enforced for all capabilityClasses regardless of this setting.
Capability-gated default (Phase 158): For small/nano agents, rag.baseFloor defaults
to 0.15 — dropping memories with a base relevance score below 0.15 (the Phase 153 poison
mitigation). For frontier/mid, the default remains 0 (no filter). Setting
rag.baseFloor: 0 explicitly is treated the same as “unset” — the capability default applies.
To disable the relevance floor on a small/nano agent, set
providers.<id>.capabilityClass: "frontier" to override the capability class, or set
rag.baseFloor: 0.01 (any value greater than 0 is treated as explicit and wins over the capability default).
Cross-encoder rerank (agents.*.rag.rerank) — opt-in (default off)A cross-encoder re-scores the top fusion candidates with the local reranker model (see memory.rerankerModel). Disabled by default; on timeout or unavailability it falls back to the fusion-ranked order, and the reranker GGUF is never downloaded while disabled.
Key
Type
Default
Description
enabled
boolean
false
Enable cross-encoder reranking of fused candidates (opt-in)
maxCandidates
number
40
Candidate cap bounding worst-case rerank latency (positive)
minResults
number
1
Skip reranking when fewer than this many candidates are present (nonnegative)
timeoutMs
number
800
Rerank wall-clock timeout in ms; on timeout fall back to fusion order (positive)
Scoring boosts (agents.*.rag.scoring) — all number, range 0-1Multiplicative boosts applied to the reranked-or-fused score before the trust filter.
Key
Type
Default
Description
recencyAlpha
number
0.2
Recency boost weight (applied via createdAt)
temporalAlpha
number
0.2
Event-time proximity boost weight (applies when occurredAt is present, neutral when absent)
proofAlpha
number
0.1
Proof-count boost weight (neutral until proofCount exists)
trustAlpha
number
0.1
Trust-level boost weight and tie-break
Entity associative lane (agents.*.rag.entityLane) — opt-in (default off)A one-hop entity-associative fusion lane. When disabled (the default), RRF fusion is unchanged.
Key
Type
Default
Description
enabled
boolean
false
Enable the one-hop entity-associative lane (opt-in)
seedCount
number
5
How many top search hits seed the entity self-join (positive)
perEntityCap
number
200
Max shared-entity neighbour rows the lane returns (positive)
weight
number
1.0
RRF weight for the entity lane (≥ 0)
Enabling the opt-in recall featuresReranking, the entity lane, and consolidation all ship off. The reranker model and the recall trace live under the top-level memory and diagnostics keys (not under agents). A schema-valid block that turns them on:
20000 for frontier/mid; 3500 for small/nano (capability-gated)
Per-file character limit for workspace files injected into system prompt. For capabilityClass in {small, nano}, the effective default drops to 3_500 chars per file (SD6, Phase 159). Setting maxChars: 20000 explicitly is treated the same as “unset” — the capability default of 3_500 still applies for small/nano. Use capabilityClassOverride: frontier or set an explicit value other than 20000 to override.
Per-agent configuration for Gemini explicit CachedContent caching. When enabled, Comis creates server-side cached content on Google AI Studio for guaranteed 90% discount on cached input tokens.
Key
Type
Default
Description
enabled
boolean
false
Enable Gemini explicit CachedContent caching. Only activates for Google AI Studio providers (not Vertex AI).
maxActiveCaches
number
20
Maximum active cached contents per agent. Must be a positive integer. Oldest entries are evicted (LRU) when this limit is reached.
Gemini cache TTL is hardcoded at 3600 seconds (1 hour) and is not configurable per-agent. Caches are automatically refreshed when more than 50% of the TTL has elapsed. This setting is independent of the Anthropic cacheRetention field — they control different provider caching mechanisms.
Context engine configuration. Controls the system that manages what your agent sees each turn. The default is dag (the v2.12 lossless LCD engine). The pipeline value (the simpler sequential layered engine) is the first-class opt-in — set version: "pipeline" to use it. DAG does lossless verbatim assembly (full faithful history + a verbatim fresh tail of the most recent steps + transcript repair), zoomably compresses the oldest history under a token budget, and exposes the in-session ctx_* expansion tools. See Compaction for a user-friendly explanation.Core fields
Key
Type
Default
Description
enabled
boolean
true
Master toggle for the context engine
version
string
"dag"
Operating mode: "dag" (default, the lossless LCD engine) or "pipeline" (opt-in, the sequential layered system)
Shared fields (both modes)
Key
Type
Default
Description
thinkingKeepTurns
number
10
Recent assistant turns that retain thinking blocks (1-50)
compactionModel
string
""
Model used for summarization. Empty (the default) means runtime-resolved against the agent’s primary provider. Override with a specific cheaper or faster model if needed.
evictionMinAge
number
15
Minimum turn age before stale errors are evicted by the dead content evictor (3-50)
Character threshold before observation masking activates (50K-1M)
compactionCooldownTurns
number
5
Turns to wait before re-triggering LLM compaction (1-50)
compactionPrefixAnchorTurns
number
2
User turns preserved at conversation head for cache prefix stability (0-10)
outputEscalation.enabled
boolean
true
Allow escalating output token budget when context is compacted
outputEscalation.escalatedMaxTokens
number
32768
Maximum output tokens after escalation (4096-128000)
observationDeactivationChars
number
80000
Character threshold to deactivate observation masking entirely (20K-500K)
ephemeralKeepWindow
number
10
Recent ephemeral tool results to keep unmasked (1-50)
DAG mode fields (version: "dag") — freshTailTurns, the leaf/condense summarization fields, and budget-bounded eviction are active in the current release (lossless verbatim assembly + threshold-triggered leaf summarization + multi-tier condensation + budget eviction); only the on-demand ctx_* recall keys (maxExpandTokens, maxRecallsPerDay, recallTimeoutMs) remain reserved until the recall tools land in a later phase:
Key
Type
Default
Description
freshTailTurns
number
8
Recent steps (assistant + tool round-trips, not user-turns) always kept verbatim and never evicted (1-50)
contextThreshold
number
0.75
Budget utilization ratio that triggers the end-of-turn leaf-summarization pass and the condense pass’s hard-fanout pressure gate (0.1-0.95). The ratio is computed against the turn’s effective budget window (min of the reconciled context window and the capability-class cap) — not the model’s configured contextWindow — so capped or served-bound small models compact at the real window.
leafMinFanout
number
8
Minimum raw messages before creating a leaf summary (2-20)
condensedMinFanout
number
4
Minimum leaf summaries before creating a condensed summary (2-20)
condensedMinFanoutHard
number
2
Absolute minimum fanout for condensed summaries under pressure (2-10)
incrementalMaxDepth
number
0
Maximum DAG depth for incremental compaction. -1 disables depth limit. (-1 to 10)
leafChunkTokens
number
20000
Maximum source tokens per leaf summary chunk (1K-100K). Clamped at runtime to the resolved summarizer model’s context window — the smaller of its configured window and the probed served window when the summarizer runs on the served-bound provider — minus the summary target, prompt-template overhead, and the threaded previous-summary size, so a small compaction summarizer (e.g. an 8K operationModels.compaction model, or a served-bound primary) is never fed an over-window chunk; oversized backlogs drain across multiple bounded passes, and a single message larger than the clamped cap is replaced by a bounded deterministic extraction (no LLM call) — its full content stays in the lossless message store.
leafTargetTokens
number
1200
Target token count for each leaf summary output (96-5K)
condensedTargetTokens
number
2000
Target token count for each condensed summary output (256-10K)
maxExpandTokens
number
4000
Maximum tokens a recall sub-agent can read per expansion (500-50K)
maxRecallsPerDay
number
10
Daily limit on recall sub-agent spawns per agent (1-100)
recallTimeoutMs
number
120000
Timeout for recall sub-agent execution in milliseconds (10K-600K)
largeFileTokenThreshold
number
25000
File token count above which content is stored as a large file reference (1K-200K)
annotationKeepWindow
number
15
Recent tool results protected from annotation replacement in DAG assembly (1-50)
annotationTriggerChars
number
200000
Character threshold before old tool results are annotated with placeholders (10K-1M)
summaryModel
string
—
Optional model override for DAG summarization (falls back to compactionModel)
summaryProvider
string
—
Optional provider override for DAG summarization
DAG robustness / spend / deferred-compaction fields (version: "dag") — all active in the current release:
Key
Type
Default
Description
deferCompaction
boolean
true
Run the afterTurn leaf + condense passes in the background on the per-conversation single-flight serializer (never blocking the turn). false runs them inline at end-of-turn (deterministic, for tests)
summarizerSpend.maxTokensPerTenantPerHour
number
500000
Per-tenant rolling-hour ceiling on summarizer (input+output) tokens; over the cap the summarizer is bypassed → truncation-only assembly (no LLM call, no turn failure). 0 disables the hourly cap (min 0)
summarizerSpend.maxTokensPerTenantPerDay
number
5000000
Per-tenant rolling-day summarizer token ceiling. 0 disables the daily cap (min 0)
summarizerBreaker.failureThreshold
number
5
Consecutive summarizer failures before the breaker opens → truncation-only assembly (min 1)
summarizerBreaker.resetTimeoutMs
number
60000
How long the summarizer breaker stays open before a half-open trial, in milliseconds
summarizerBreaker.halfOpenTimeoutMs
number
30000
Half-open trial window for the summarizer breaker, in milliseconds
DAG mode fields are validated by Zod even when version is "pipeline". This means you can pre-configure DAG settings before switching modes — invalid values will be caught at startup regardless of the active mode.
The DAG cross-agent isolation adds an agent_id column to the full-text search index, created once on a fresh database (no migration — sessions start fresh on the LCD engine). A pre-existing development ~/.comis database created before this release may need to be wiped to pick up the isolated index; a fresh install needs nothing. See Compaction.
Engine scope — served-window honesty and the viable floor (the recorded pipeline-parity verdict).
The turn-time pre-flight fit check, output-headroom enforcement, and the served/cap
context_exhausted provenance (the exhaustion text that names OLLAMA_CONTEXT_LENGTH /
PARAMETER num_ctx / contextEngine.budget.effectiveContextCapSmall — see the
served-window section) apply to the DEFAULT
version: "dag" engine only. The "pipeline" engine’s fit guard is its budget-aware
compaction trigger — it compacts when the estimated context exceeds 85% of the budget
window, computed against the UNCAPPED configured contextWindow (no served reconcile, no
capability-class cap) — plus reactive provider-side context_too_long classification when
the provider rejects an oversized request. (The trigger behavior is test-pinned.)
Pipeline compaction likewise summarizes only the longest oldest-first span that fits the
summarizer’s window (reserving a summarizer-sized output allowance — at most a quarter of
that window — for the summary itself) and keeps the un-summarized remainder in context
(never dropped). When even the oldest message alone exceeds that budget, that single
message is escalated through the compaction fallback ladder (worst case a bounded
count-only note), so every evaluation shrinks the backlog and the 85% trigger re-fires on
later turns until it drains.Recorded decision: the boot viable-floor WARN (the minViable equation) is deliberately
ENGINE-AGNOSTIC — it fires for dag AND pipeline agents alike, because the minViable
arithmetic (bootstrap + tool schemas + output headroom + fresh-tail reserve + safety margin
vs the effective window) holds regardless of engine. What differs per engine is which
TURN-TIME guard backs it up — dag: the pre-flight with knob-named exhaustion; pipeline: the
85% compaction trigger + reactive classification. An operator running a pipeline agent
should read the boot WARN as applying to them, while the per-turn preflight surfaces do not.
Prevents 256K context-window models from over-provisioning when running a small executive (e.g., qwen3.6:27b). Applied in computeTokenBudgetForProfile before history budget computation. frontier and mid classes always receive the full contextWindow.
Maximum effective context tokens for capabilityClass="small". 0 = no cap (use raw contextWindow). Applied to prevent 256K overfill degrading a 27–35B executive.
Minimum visible output tokens guaranteed on every LLM dispatch — the non-reasoning floor (answer or tool-call body) that must remain after the model’s thinking block. The total output headroom is thinkingReserve(reasoningStyle, thinkingLevel) + minVisibleOutputTokens. Raising this value increases the safety margin but reduces available history tokens. Applies to all capability classes; the thinking reserve on top of this floor varies by thinkingLevel (high adds 2,048 tokens; xhigh adds 4,096 tokens; models with no thinking block add 0).
Controls whether the thinking-effort governor may automatically adjust the active thinkingLevel when the remaining context window after eviction is tight.
Key
Type
Default
Description
agents.<id>.thinking.downshiftOnTightWindow
boolean
true
When true, the thinking-effort governor may automatically lower thinkingLevel (high → medium → low) for a dispatch when the remaining context room after eviction cannot cover thinkingReserve(thinkingLevel) + minVisibleOutputTokens. This prevents the model’s thinking block from consuming the entire output budget and silently truncating the answer or tool call. For frontier and mid-capability models the governor is always a no-op regardless of this setting (their effective windows are large enough that the threshold is never reached). Set to false to disable down-shifting and always preserve the configured thinkingLevel (may result in context_exhausted degradation on tight windows instead of graceful down-shift).
Controls how the context engine handles LLM summarization for low-capability models. Applied in both pipeline llm-compaction and DAG leaf-summarizer layers.
When true: small/nano capabilityClass → eviction-first (or strongerSummarizerModel if set) instead of same-model LLM summarization. Prevents degraded summaries from a weak model. A context:compaction_routed event fires when routing occurs.
Optional "provider:modelId" for a stronger summarizer when small/nano models are detected. Empty string = pure eviction/deterministic fallback. Example: "anthropic:claude-haiku-4-5-20250929". A keyless local provider (Ollama / LM Studio) is also valid (e.g. "qwen36-local:qwen3.6:35b") — no API key required.
Security (S4): Eviction never drops security-relevant context (sender-trust, safety reinforcement, untrusted-content markers, canary token). The security-context-pinner enforces this fail-closed.
Controls the compact-secure promptMode for small/nano models. This mode assembles the system prompt to a bounded target while always retaining the full safety core, sender-trust, and config-secret sections.
Key
Type
Default
Description
agents.<id>.contextEngine.compactPrompt.enabled
boolean
true
Enable compact-secure promptMode for small/nano capabilityClass. When true, retains the FULL safety core, sender-trust, and config-secret sections — never falls back to minimal mode’s empty safety. frontier/mid agents are unaffected.
Soft token target for the compact prompt (~chars/3.5). At 3000 ≈ 10,500 chars.
Security warning (S1): Setting enabled=false does not make the prompt smaller — it restores the full-size full promptMode. The compact prompt is safe for all deployments because it retains the security core. It is not safe to use minimal promptMode (which drops the safety block) for any security-sensitive deployment.
Controls whether within-conversation history is assembled relevance-first (the margin arbiter allocates the contended history budget across tiers by fused rank, with the fresh-tail and security-pinned floors guaranteed) or recency-first (the existing newest-kept eviction). The decision is capability-gated: small/nano models on a non-caching provider default relevance-first (reordering is free when there is no prompt cache to break); frontier/mid and any prompt-caching model default recency-first and are byte-identical to prior releases (the arbiter does not run for them). Precedence: explicit per-agent config > capability default > off.
Force the relevance policy. true runs the margin arbiter at the eviction seam (relevance-first); false keeps recency-first. The field is optional with no default — omit it to let the capability gate decide. An explicit value (either direction) always wins. Setting true on a frontier/mid agent opts that agent into relevance-first; setting false on a non-caching small/nano agent forces recency-first.
Capability-gated default (Phase 173): The small/nano relevance-first default is gated on
supportsPromptCache=false — a non-caching local model (typical Ollama) reorders history for
free, while a caching model stays recency-first below the cache fence (reordering would break the
prefix cache). frontier/mid are byte-identical by default (the arbiter never runs for them).
The small/nano default-on flip is measurement-gated (validated by the outcome harness before
being relied upon in production); the mechanism ships behind this flag. Precedence:
explicit per-agent config > capability default > off.
Security (RETR-05): The arbiter never demotes security-relevant history (canary token,
untrusted-content delimiters, safety reinforcement, sender-trust markers) — those items are
unconditional floors, excluded from relevance candidacy and always kept, exactly as the
recency path’s pre-flight already protects them. A content-free context:arbitrated event
(per-tier kept counts; the discretionary pool offered and consumed plus the
unconditional floor-token weight; the kept LTM/KG ids; and a relevanceFirst boolean)
fires only on the relevance-first path; it carries no message, memory, or query content.
Periodic LLM-driven consolidation of similar memories into observations. Opt-in (default off) because each run spends model tokens — it is a cost gate. When enabled, a scheduled cron clusters near-duplicate memories and folds each cluster into a single observation; external-trust memories are excluded by default.
Key
Type
Default
Description
enabled
boolean
false
Enable periodic consolidation for this agent (opt-in cost gate)
schedule
string
"30 3 * * *"
Cron schedule for consolidation runs (daily at 03:30 UTC by default)
similarityThreshold
number
0.82
Cluster-neighbour cosine threshold for single-link clustering (0-1)
dedupThreshold
number
0.9
Content-similarity threshold for the deterministic dedup pre-check (0-1)
maxCandidatesPerRun
number
200
Maximum raw candidates fetched per run (positive)
maxClusterSize
number
12
Maximum candidates folded into one observation (positive)
maxClustersPerRun
number
25
Maximum clusters consolidated per run (positive)
maxConsolidationTokens
number
1024
Maximum LLM response tokens for one merge call (positive)
consolidateExternal
boolean
false
Include external-trust memories in consolidation (excluded by default)
Injects the current execution objective and uncompleted step checklist at the context tail every turn. Helps weak models stay on task across multi-turn executions. Default-ON for scaffolded (small/nano) tiers; off by default for frontier/mid, but injected when enabled: true is set explicitly. Precedence: explicit per-agent config > capability default > off.
Key
Type
Default
Description
agents.<id>.goalAnchor.enabled
boolean
automatic for small/nano (no config needed); false for frontier/mid
Enable GoalAnchor tail injection. Default-ON for small/nano (no config needed); off for frontier/mid unless enabled: true is set explicitly (which injects on frontier/mid too). When effective, the execution objective and uncompleted step checklist is tail-appended each turn. Precedence: explicit per-agent config > capability default > off.
agents.<id>.goalAnchor.maxChars
number (100–2000)
500
Maximum characters for the injected GoalAnchor block. ~5–10 steps at ~50 chars/step.
Capability-gated default (Phase 158): For capabilityClass in {small, nano} (e.g. any Ollama
qwen3.6 deployment), goalAnchor is default-ON — no enabled: true required. Set
agents.<id>.goalAnchor.enabled: false explicitly to disable it on a small/nano agent.
For frontier/mid agents, behavior is unchanged (default off). Precedence:
explicit per-agent config > capability default > off.
A pre-delivery critic that checks the terminal response against the GoalAnchor checklist before delivery. Unmet requirements redirect the executor; exhausted retries deliver an honest unmet-list. Meaningful only for scaffoldLevel=max (small/nano) agents.
Key
Type
Default
Description
agents.<id>.verification.enabled
boolean
cost-gated automatic for small/nano when a distinct cheap critic is configured; false otherwise
Enable pre-delivery verification critic. Fires only when a completion-claiming response meets minResponseChars. Default false for frontier/mid and for small/nano when no distinct cheap critic model is configured = opt-in.
agents.<id>.verification.minResponseChars
number (50–2000)
200
Minimum response length in characters before the critic is invoked. Prevents firing on short acks, clarifying questions, and non-completion replies.
Security (S2): The critic treats the output-under-review as untrusted (wrapExternalContent), inherits the safety core, embeds the canary, fails closed (uncertain → not-verified, never auto-approve), and re-validates any implied tool calls through the same exec gates. Use agents.<id>.operationModels.verification to run the critic on a cheaper or faster model.
Capability-gated default + cost-gate (Phase 158): For capabilityClass in {small, nano},
verification is default-ON — but only when agents.<id>.operationModels.verification
resolves to a distinct cheaper model (i.e. operationModels.verification.model is explicitly
configured to a different, faster model). If no distinct critic model is configured, the default
stays OFF — the critic never silently doubles local-CPU inference latency. Set
agents.<id>.verification.enabled: false explicitly to force-off the critic on a small/nano
agent even when a cheap critic is configured. Set agents.<id>.verification.enabled: true to
force-on the critic regardless of class (including frontier/mid, if a critic model is configured).
Precedence: explicit per-agent config > capability default > off.
Maximum critic retry redirects before delivering an honest unmet-list. After this many not-verified verdicts, the executor delivers an honest unmet-list instead of an unqualified “done”. Prevents infinite re-prompt loops.
Directories to scan for SKILL.md files. For named agents, Comis automatically prepends the agent’s workspace skills directory (~/.comis/workspace-{agentId}/skills/) at startup — you do not need to include it here.
watchEnabled
boolean
true
Enable file watching for automatic skill reload
watchDebounceMs
number
400
Debounce interval in ms (100-5000)
Built-in Tools (agents.*.skills.builtinTools)
Key
Type
Default
Description
read
boolean
true
Read file contents
write
boolean
true
Write or overwrite files
edit
boolean
true
Surgical search-and-replace on files
grep
boolean
true
Regex search across files (requires rg)
find
boolean
true
Find files by glob pattern (requires fd)
ls
boolean
true
List directory contents
exec
boolean
true
Shell command execution
process
boolean
true
Background process management
webSearch
boolean
true
Web search API integration
webFetch
boolean
true
URL content fetching
browser
boolean
true
Headless browser control
Tool Policy (agents.*.skills.toolPolicy)
Key
Type
Default
Description
profile
enum
"full"
Baseline tool set: minimal, coding, messaging, supervisor, full
allow
string[]
[]
Additional tools to allow beyond the profile
deny
string[]
[]
Tools to deny even if in the profile
Prompt Skills (agents.*.skills.promptSkills)
Key
Type
Default
Description
maxBodyLength
number
20000
Maximum skill body length in characters
enableDynamicContext
boolean
false
Enable shell command execution in skill bodies
maxAutoInject
number
3
Maximum prompt skills auto-injected per request (0-20)
Path to SQLite database file (relative to dataDir)
walMode
boolean
true
Enable WAL mode for concurrent reads
embeddingModel
string
"text-embedding-3-small"
Embedding model identifier
embeddingDimensions
number
1536
Embedding vector dimensions
Reranker model (memory.reranker*)The cross-encoder model used when agents.*.rag.rerank.enabled is true. The hf: URI auto-downloads on first enable; nothing is downloaded while reranking is off. This is a distinct model from the bi-encoder embedder (see the embedding accordion) — do not conflate the two.
Thread count for the reranker ranking context (positive)
Compaction (memory.compaction)
Key
Type
Default
Description
enabled
boolean
true
Whether automatic compaction is enabled
threshold
number
1000
Minimum entries before compaction triggers
targetSize
number
500
Maximum entries after compaction
Retention (memory.retention)
Key
Type
Default
Description
maxAgeDays
number
0
Maximum age in days (0 = no limit)
See Memory and Search for how these settings affect agent memory behavior.
embedding
Embedding provider configuration for vector search. Supports local GGUF models via node-llama-cpp or remote OpenAI.
Key
Type
Default
Description
enabled
boolean
true
Enable embedding generation. When false, only FTS5 search is used.
provider
enum
"auto"
Provider preference: auto (tries local then remote), local, openai
autoReindex
boolean
true
Auto-reindex when provider model changes
multilingual
boolean
(unset)
Advisory: declare the embedder multilingual for the comis fleet model-health line. Omitted: inferred from the model id (bge-m3 / multilingual-e5 / LaBSE / E5 read as multilingual; otherwise unknown). Does not gate search — the FTS5 trigram floor carries recall regardless. See Multilingual.
Context size for embedding model (tokens). nomic-embed-text-v1.5 trains on 2048; extending to 8192 requires YaRN RoPE scaling not available in node-llama-cpp.
OpenAI (embedding.openai)
Key
Type
Default
Description
model
string
"text-embedding-3-small"
OpenAI embedding model
dimensions
number
1536
Vector dimensions (must match model output)
Cache (embedding.cache)
Key
Type
Default
Description
maxEntries
number
10000
Maximum cached embeddings in L1 in-memory cache (0 = disabled)
persistent
boolean
false
Enable persistent L2 SQLite cache
persistentMaxEntries
number
50000
Maximum entries in L2 persistent cache
ttlMs
number
(none)
TTL in milliseconds for cache entries. When unset, LRU eviction only.
pruneIntervalMs
number
300000
Prune check interval in milliseconds (5 min)
Batch (embedding.batch)
Key
Type
Default
Description
batchSize
number
100
Texts per batch call
indexOnStartup
boolean
true
Index unembedded memories on startup
See Embeddings for a guide on choosing between local and remote embedding providers.
Mount the @comis/web SPA at /app/* and the REST/SSE API at /api/* (sharing gateway host/port/auth). When false, the daemon skips /app/*, /api, SSE, and the / → /app/ redirect.
Security configuration for log redaction, audit logging, permissions, action confirmation, agent-to-agent messaging, and encrypted secrets.
Key
Type
Default
Description
logRedaction
boolean
true
Enable structured log redaction of sensitive fields
auditLog
boolean
true
Enable audit event logging
Permission (security.permission)
Key
Type
Default
Description
enableNodePermissions
boolean
false
Enable Node.js --permission flag enforcement
allowedFsPaths
string[]
[]
Allowed filesystem read/write paths
allowedNetHosts
string[]
[]
Allowed network hosts for outbound connections
Action Confirmation (security.actionConfirmation)
Key
Type
Default
Description
requireForDestructive
boolean
true
Require confirmation for destructive actions
requireForSensitive
boolean
false
Require confirmation for sensitive actions
autoApprove
string[]
[]
Actions that bypass confirmation
Agent-to-Agent (security.agentToAgent)
Key
Type
Default
Description
enabled
boolean
true
Enable cross-agent session messaging
maxPingPongTurns
number
3
Max reply-back loop turns (0-5)
allowAgents
string[]
[]
Allowed agent IDs for sub-agents (empty = all)
subAgentRetentionMs
number
3600000
Retention for completed sub-agent sessions (1 hour)
waitTimeoutMs
number
60000
Default timeout for wait mode (60 seconds)
subAgentMaxSteps
number
50
Default max steps for sub-agent execution
subAgentToolGroups
enum[]
["coding"]
Default tool profile groups: minimal, coding, messaging, supervisor, full
subAgentMcpTools
enum
"inherit"
MCP tool inheritance: inherit or none
Subagent Context (security.agentToAgent.subagentContext)Controls how sub-agent sessions receive context, condense results, and manage their lifecycle. All fields have sensible defaults — you only need to configure values you want to change. See Subagent Context Lifecycle for a full explanation.
Key
Type
Default
Description
maxSpawnDepth
number
3
Maximum spawn chain depth (1-10). A depth of 3 means parent -> child -> grandchild.
maxChildrenPerAgent
number
5
Maximum concurrent active children per parent agent (1-20). Graph pipeline nodes bypass this limit.
maxResultTokens
number
4000
Token threshold for condensation (100-100,000). Results under this pass through unchanged.
resultRetentionMs
number
86400000
How long full result files are kept on disk before auto-sweep (default 24 hours).
condensationStrategy
enum
"auto"
When to condense: auto (based on token count), always (force condensation), never (always passthrough).
Inject the sub-agent’s objective after compaction so it survives context trimming.
artifactPassthrough
boolean
true
Pass artifact file references from the spawn call to the sub-agent’s context.
autoCompactThreshold
number
0.95
Context fill ratio (0.5-1.0) for triggering auto-compaction. This field is present in the schema but its runtime effect on the context engine compaction trigger is being refined in a future release.
maxRunTimeoutMs
number
600000
Maximum wall-clock time for a sub-agent run before watchdog force-fail (10 minutes). Hard ceiling regardless of step count.
perStepTimeoutMs
number
60000
Per-step time budget for dynamic watchdog calculation (1 minute). Dynamic timeout = min(max_steps x perStepTimeoutMs, maxRunTimeoutMs).
errorPreservation
boolean
true
Preserve error details in condensed results instead of summarizing them away.
narrativeCasting
boolean
true
Format sub-agent results with tagged prefixes and metadata for the parent agent.
resultTagPrefix
string
"Subagent Result"
Tag prefix used in narrative casting (1-100 characters). Appears as [{prefix}: {label}].
parentSummaryMaxTokens
number
1000
Token limit for the parent context summary when includeParentHistory is "summary" (100-10,000).
Storage Mode (security.storage)
Key
Type
Default
Description
storage
"encrypted" | "file" | "env"
"encrypted"
Credential storage mode (security.storage) for all three stores (secrets, OAuth profiles, MCP tokens).
Three modes are supported:storage: "encrypted" (default — secure-by-default) — AES-256-GCM-encrypted rows in
~/.comis/secrets.db (SQLite). Requires SECRETS_MASTER_KEY to be set (generated automatically
on first boot). Defends against disk/backup theft at the cost of making SECRETS_MASTER_KEY
the crown jewel.storage: "file" — plaintext opt-in bargain — structured JSON at ~/.comis/secrets.json,
~/.comis/auth-profiles.json, and ~/.comis/mcp-tokens/ with mode 0600 (user-only read) in a
0700 directory. Defends against other local users reading secrets; does not defend against
root, disk/backup theft, or process-memory inspection. No SECRETS_MASTER_KEY required.
Hot-reload: comis auth login writes are picked up without a daemon restart.storage: "env" (security.storage: env) — read-only posture — snapshots .env/process.env into the
SecretManager at boot and scrubs sensitive names from process.env. Runtime writes
(env.set, secrets.set, comis auth login) are rejected with an actionable error. Use
for read-only deployments where credentials are injected via environment variables.
security.storage is runtime-immutable — changing it requires editing config.yaml
and restarting the daemon. The config schema is z.strictObject, so unknown keys are
rejected at boot. Back up ~/.comis/config.yaml before editing.
Mode mismatch detection: If you switch modes while credentials remain in the inactive
backend, the daemon emits a boot WARN naming the stranded store and the manual migration step.
Cross-mode migration tooling is planned for a future release.
See Security for a comprehensive overview of the security model.
approvals
Action approval workflow. Rules are evaluated in order (first match wins).
Key
Type
Default
Description
enabled
boolean
true
Enable the approval workflow
defaultMode
enum
"auto"
Default mode for unmatched actions: auto, require, deny
rules
ApprovalRule[]
[]
Ordered approval rules
defaultTimeoutMs
number
300000
Approval request timeout in ms (5 min)
denialCacheTtlMs
number
60000
How long denied actions are cached before re-prompting (ms). Set to 0 to disable denial caching.
batchApprovalTtlMs
number
30000
How long approved actions are cached for automatic re-approval in batch operations (ms). Set to 0 to disable batch approval caching.
Batch approval caching (batchApprovalTtlMs) allows sequential identical tool calls to auto-approve within the TTL window, reducing approval fatigue during batch operations. The cache persists across daemon restarts.
Approval Rule (approvals.rules[])
Key
Type
Default
Description
actionPattern
string
(required)
Pattern matching action types
mode
enum
"auto"
Approval mode: auto, require, deny
timeoutMs
number
300000
Timeout for human approval (0 = no timeout)
minTrustLevel
enum
"verified"
Trust level for auto-approve: untrusted, basic, verified, admin
See Approvals for approval workflow configuration patterns.
The tool-first capability layer. Operator-only — agents cannot self-configure capability routing or detour policy. The entire tooling tree is added to IMMUTABLE_CONFIG_PREFIXES and rejected by config.patch from agent-callable surfaces.
Restart required. Changes to tooling.capabilityIndex.enabled and tooling.installDetours.mode apply only after the daemon restarts. The capabilityIndex.enabled toggle selects between two cached system-prompt shapes (one-line residual vs flat tool dump); in-process reload is not supported. Operator config edits go through the standard config.patch / config.apply path which validates → writes → 200 ms delayed SIGUSR1 → process restart.
Top-level fields:
capabilityClusters — cluster definitions and builtin tool→cluster assignments. Operator-defined clusters merge key-by-key with the three reserved IDs (external-integrations, prompt-skills, other-tools); operator values win per key.
mcp.capabilityHints — operator hints for connected MCP servers (record keyed by server name; each entry is { cluster, description, replacesPackages }).
skills.capabilityHints — operator hints for prompt skills (record keyed by skill name or skill key; each entry is { cluster, description?, replacesPackages }).
capabilityIndex.enabled — boolean toggle for the per-turn ## Capabilities block (default true).
installDetours.mode — "observe" / "advise" / "soft-stop" (default "advise"). Controls how the install-detour validator acts when an exec command would pip install / npm install a package that overlaps with an already-connected MCP server or skill.
Operator typos in any cluster reference (under capabilityClusters.builtinAssignments[*], mcp.capabilityHints[*].cluster, or skills.capabilityHints[*].cluster) emit a Pino WARN at daemon startup with errorKind: "config", an operator-actionable hint, and the offending { configPath, unresolvedClusterId } payload — the daemon does NOT crash; unresolved cluster references fall back to external-integrations (for MCP hints) or prompt-skills (for skill hints). Check pm2 logs comis or journalctl -u comis after restart to surface typos.
Enable Markdown IR pipeline for format-aware chunking
Per-Channel Override (streaming.perChannel.)
Key
Type
Default
Description
enabled
boolean
true
Enable streaming for this channel
chunkMode
enum
"paragraph"
Chunk mode
chunkMaxChars
number
(unset)
Max chars per block (falls back to platform limit)
chunkMinChars
number
100
Min chars before allowing split
typingMode
enum
"thinking"
Typing indicator mode
typingRefreshMs
number
6000
Typing refresh interval in ms. Per-platform defaults are applied automatically (Telegram 4s, Discord 8s, etc.) — this field overrides the automatic default
typingCircuitBreakerThreshold
number
3
Consecutive typing failures before circuit breaker permanently stops indicator
typingTtlMs
number
60000
Maximum typing indicator duration in ms before auto-stop (refreshes on content signals)
useMarkdownIR
boolean
true
Use Markdown IR pipeline
tableMode
enum
"code"
Table conversion mode
Each per-channel entry also includes nested deliveryTiming and coalescer overrides with the same schema as the top-level versions.
See Delivery for block streaming behavior and platform-specific delivery.
autoReplyEngine
Controls whether the agent pipeline activates for inbound messages. Separate from the pattern-based auto-reply rules in integrations.autoReply.
Key
Type
Default
Description
enabled
boolean
true
Enable the auto-reply engine
groupActivation
enum
"mention-gated"
Group chat mode: always, mention-gated, custom
customPatterns
string[]
[]
Custom regex patterns for custom mode
historyInjection
boolean
true
Inject non-trigger group messages as context
maxHistoryInjections
number
50
Max history-injected messages per session
maxGroupHistoryMessages
number
20
Max group history messages stored per session
sendPolicy
Outbound message gating rules. Rules evaluated in order; first match wins.
Key
Type
Default
Description
enabled
boolean
true
Enable send policy enforcement
defaultAction
enum
"allow"
Default action: allow or deny
rules
SendPolicyRule[]
[]
Ordered list of rules
Send Policy Rule (sendPolicy.rules[])
Key
Type
Default
Description
channelId
string
(unset)
Channel ID to match
chatType
string
(unset)
Chat type: dm, group, thread, channel, forum
channelType
string
(unset)
Channel type: telegram, discord, slack, whatsapp
action
enum
"allow"
Action: allow or deny
description
string
(unset)
Human-readable description
envelope
Message envelope enrichment for inbound messages before they reach the LLM.
Key
Type
Default
Description
timezoneMode
string
"utc"
Timezone: utc, local, or IANA timezone string
timeFormat
enum
"12h"
Time display: 12h or 24h
showElapsed
boolean
true
Show elapsed time since previous message
showProvider
boolean
true
Show platform prefix (e.g., [telegram])
elapsedMaxMs
number
86400000
Max elapsed time to display (24 hours)
messages
Messaging UX configuration for outbound message formatting.
Key
Type
Default
Description
maxOutboundLength
number
0
Max outbound message length (0 = no limit)
splitLongMessages
boolean
true
Split long messages into parts
splitMaxChars
number
4000
Character limit per split part
splitSeparator
string
"\n\n"
Separator between split parts
showTypingIndicator
boolean
true
Show typing indicator during processing
systemMessagePrefix
string
"[System] "
Prefix for system messages
readReceipts
boolean
false
Enable read receipts
models
Model catalog and alias configuration for model discovery and friendly names.
Key
Type
Default
Description
scanOnStartup
boolean
false
Enable automatic model scanning
scanTimeoutMs
number
30000
Scan timeout in ms
aliases
ModelAlias[]
[]
Friendly model aliases
defaultModel
string
""
Default model ID (falls back to claude-sonnet-4-5-20250929)
LLM provider configuration. API keys are referenced by SecretManager key name, never stored in plaintext.
Key
Type
Default
Description
entries
Record<string, ProviderEntry>
{}
Named provider configurations
Provider Entry (providers.entries.)
Key
Type
Default
Description
type
string
(required)
Provider type (e.g., "anthropic", "openai", "ollama")
name
string
""
Display name
baseUrl
string
""
API base URL override
apiKeyName
string
""
SecretManager key name for API key
enabled
boolean
true
Whether this provider is enabled
timeoutMs
number
120000
Config-echo only — NOT enforced on completion calls. The completion deadline is agents.<id>.promptTimeout.promptTimeoutMs (stall budget). Setting a non-default value emits a one-time boot WARN naming the real knob.
maxRetries
number
2
Max retries for transient errors
headers
Record<string, string>
{}
Custom headers for API requests
capabilities
ProviderCapabilities
(auto-detected)
Provider-level behavioral overrides. Usually auto-detected from provider type; manual config overrides auto-detection. See sub-schema below.
models
UserModel[]
[]
User-defined model entries for this provider. Allows registering custom/fine-tuned models with capability metadata. See sub-schema below.
Provider family for response handling: default, openai, anthropic, google
dropThinkingBlockModelHints
string[]
[]
Model ID substrings that trigger thinking block suppression
transcriptToolCallIdMode
enum
"default"
Tool call ID format: default (pass-through) or strict9 (truncate to 9 chars for providers with ID length limits)
transcriptToolCallIdModelHints
string[]
[]
Model ID substrings that trigger strict9 tool call ID mode
supportsVision
boolean
false
When true, image attachments are forwarded to the model. When false (or unset), images are warn-dropped with a log entry. Set true for qwen3.6:27b/35b GGUF variants that support image input.
supportsPromptCache
boolean
(auto from providerFamily)
Whether models on this provider support prompt caching. Auto-detected from providerFamily: true for anthropic/google, false for default/openai. Set explicitly to override. When false, prompt assembly emits a single block with no cache_control split overhead.
supportsStructuredOutput
boolean
false
When true, tool-call repair uses constrained decoding (e.g., Ollama’s /api/generateformat param) for near-miss tool JSON. When false, lenient parse-and-repair is used.
capabilityClass
"frontier" | "mid" | "small" | "nano"
(resolver heuristic)
Explicit capability-class override for all models on this provider. Overrides the default resolver heuristic (which keys off model context-window and provider-family). Forces scaffoldLevel (GoalAnchor, critic) and securityLevel (lockdown intensity). Use "small" for an Ollama provider running qwen3.6. Setting "frontier" for a weak model silences scaffolding — not recommended for production.
probeServedWindow
boolean
true for type: "ollama"
Probe the Ollama served num_ctx at daemon boot and reconcile with the configured contextWindow. When unset (or true), Comis queries GET /api/ps and POST /api/show on start and uses the smaller of the served window vs. configured. Set to false to skip the probe if Ollama is offline at daemon start.
v2.15 capability-gated defaults: The v2.14 reliability scaffold (GoalAnchor, relevance floor,
cost-gated critic) is now default-ON for small/nano agents — no per-agent opt-in required.
Phase 159 adds two capacity defaults for small/nano: bootstrap.maxChars drops to 3_500
chars per file (SD6), and the active-tool ceiling is set to 24 tools (SD7; overflow tools stay
reachable via discover_tools). Phase 160 adds a total bootstrap budget (5,000 chars sum-cap
for small/nano), capability-gated graph concurrency (small/nano→2, frontier/mid→4), and a
corrected bootstrap-budget warn threshold. frontier/mid behavior is byte-identical to
v2.14 (the non-regression guarantee). The security guarantee holds: a weaker model class cannot
lower the platform’s security posture; the scaffolding defaults reinforce it.
UserModel (providers.entries..models[])
Key
Type
Default
Description
id
string
(required)
Model identifier (e.g., "my-finetuned-model")
name
string
(unset)
Display name for the model
reasoning
boolean
false
Whether this model supports extended thinking
contextWindow
number
(unset)
Context window size in tokens
maxTokens
number
(unset)
Maximum output tokens
input
string[]
["text"]
Supported input modalities: "text", "image"
cost
ModelCost
(unset)
Token cost rates for budget tracking. See sub-schema below.
comisCompat
ModelCompatConfig
(unset)
Comis-specific compatibility flags. See sub-schema below.
Tool schema normalization profile: "default", "xai" (strips constraint keywords xAI rejects), or "gbnf" (GBNF-safe structural transforms for llama.cpp-family local providers: collapses nullable anyOf/oneOf and ["T","null"] type arrays, injects properties: {} on free-form objects and a type on typeless nodes — removal/relaxation only, pattern/format are kept)
toolCallArgumentsEncoding
enum
(unset)
How tool call arguments are encoded: "json" or "html-entities" (auto-decoded)
nativeWebSearchTool
boolean
(unset)
Whether this model uses native web search (filters out Comis web-fetch tool)
GBNF auto-detection: providers with type: "ollama" default their models to the gbnf
profile automatically. An explicit toolSchemaProfile value always wins for gbnf (set
"default" to opt a model out while debugging) — unlike xai, whose auto-detected flags are
non-negotiable API requirements and override user config.Explicit opt-in: LM Studio, llama-server (llama.cpp), and vLLM endpoints have no provider
type that auto-enables the profile (only type: "ollama" does) — they opt in per model via
comisCompat.toolSchemaProfile: "gbnf" (zero new config keys; this is the existing comisCompat
surface).Reactive repair: if a provider still rejects a schema at grammar-compile time, Comis
classifies the 400 (tool_schema_unsupported), strips pattern/format from the offending
tools, and retries exactly once per session before failing honestly; comis explain names the
offending tool. A once-per-boot INFO line summarizes which tools were transformed (names and
keyword counts only — never schema bodies).For the served-context-window side of local-provider setup, see
Local model context window.
ModelCost (providers.entries..models[].cost)
Key
Type
Default
Description
input
number
(unset)
Cost per input token (for budget tracking)
output
number
(unset)
Cost per output token
cacheRead
number
(unset)
Cost per cache-read token (Anthropic prompt caching)
The following config.yaml snippet shows the recommended settings for a secure, scaffolded qwen3.6 local deployment. All security-relevant keys from Phases 151–155 are shown with their recommended values.
providers: entries: qwen36-local: baseUrl: "http://localhost:11434/v1" # No apiKeyName — keyless Ollama capabilities: capabilityClass: small # Forces scaffoldLevel=max + securityLevel=locked supportsVision: true # For 27b/35b GGUF; false for MLX variants supportsStructuredOutput: true # Enables constrained-decode tool-call repairmodels: defaultModel: "qwen36-local:qwen3.6:35b"agents: default: contextEngine: budget: effectiveContextCapSmall: 32000 # Cap effective history at 32K (prevents 256K overfill) compaction: preferEvictionByCapability: true # Evict rather than summarize with the small model compactPrompt: enabled: true # Compact-secure prompt (retains full safety core) targetTokens: 3000 goalAnchor: # enabled: true is the capability default for small/nano — omit to rely on capability default enabled: true # Explicit: tail-inject objective checklist each turn (scaffoldLevel=max) maxChars: 500 verification: # enabled: true is the capability default for small/nano when operationModels.verification is set enabled: true # Explicit force-on: pre-delivery critic (honest unmet-list on failure) minResponseChars: 200 honesty: maxCriticRetries: 2 rag: baseFloor: 0.3 # Memory relevance floor; capability default is 0.15 for small/nano (0 = sentinel)
These defaults fire only for capabilityClass in {small, nano} (e.g., any Ollama qwen3.6 deployment). The 2026-06-08 re-verification measured ~32K input tokens per qwen3.6 turn — 98 active tool schemas and a 14.4K-char bootstrap file drove a 495% bootstrap warning. Phase 159 adds two capability-gated defaults to address this without changing frontier/mid behavior. Phase 160 adds a total bootstrap budget, capability-gated graph concurrency, and a corrected bootstrap-budget warn threshold.
For capabilityClass in {small, nano}, bootstrap.maxChars defaults to 3_500 chars per file (down from the schema baseline of 20_000). For frontier/mid, the value is unchanged (20_000 per file — byte-identical to v2.14).Key properties:
Per-file limit. Each workspace file is truncated individually. At 3_500 chars, AGENTS.md (the largest file) is preserved head 70% + tail 20%; smaller identity files (SOUL/IDENTITY/USER/ROLE/TOOLS/HEARTBEAT/BOOT) fit entirely within the limit.
Sentinel: 20_000 is treated as “unset”. Setting bootstrap.maxChars: 20000 explicitly has the same effect as omitting the key — the capability default of 3_500 applies for small/nano. This is because 20_000 is the schema default, so the runtime cannot distinguish “operator chose 20_000” from “no override”. To force the 20_000 limit on a small/nano agent, set agents.<id>.capabilityClassOverride: frontier instead.
If security-critical content lives in the middle of AGENTS.md (between the first 70% and last 20%), it may be truncated for small/nano models. Place critical rules in the first ~2,450 chars or last ~700 chars of AGENTS.md for reliable injection. See AGENTS.md placement guidance for the recommended section order.
To override the capability default for a single agent:
agents: my-agent: bootstrap: maxChars: 8000 # override the small/nano default of 3_500
For capabilityClass: small, at most 24 tool schemas are active in each prompt request. The cold long-tail (tools not in the core set and not recently used) is deferred.
No capability is removed. All deferred tools remain fully callable via the discover_tools mechanism. The model can search for and invoke any deferred tool in the same turn. This ceiling is a prompt-size control, not a security control — it does not restrict which tools the agent is authorized to use.
The ceiling applies only to capabilityClass: small. Key properties:
CORE_TOOLS are never deferred. The following tools are always kept active regardless of the ceiling: read, edit, write, grep, find, ls, apply_patch, exec, process, message, memory_search, memory_store, memory_get, web_search, web_fetch.
Recently-used tools are never deferred. Any tool the agent invoked in recent turns is preserved in the active set.
nano class is not affected. Nano already uses aggressive CORE_TOOLS-only deferral; the 24-tool ceiling does not change its behavior.
frontier/mid classes are not affected. No ceiling applies — behavior is unchanged from v2.14.
Savings: 15 CORE_TOOLS + 9 discretionary slots at 24 active tools vs. the previous 40 saves approximately 750–1,250 tokens per turn (~300 chars per tool schema average).
For capabilityClass in {small, nano}, the total bootstrap budget caps the sum of all bootstrap file content at 5,000 chars, applied as a second pass after the per-file 3_500-char cap (SD6).Key properties:
Sum-cap, not per-file. If all workspace files together would exceed 5,000 chars after per-file truncation, each file is proportionally scaled down to fit the total budget.
Per-file floor. Each file retains at least 300 chars, regardless of how many files compete for the budget. No file is silenced entirely.
Proportional truncation. Each file’s allocation is (file_chars / total_chars) * totalMaxChars, floored at 300. Content is taken from the beginning of the file (direct slice — not head+tail; the per-file SD6 pass already applied head+tail truncation).
frontier/mid unaffected. No total cap is applied; behavior is byte-identical to v2.14.
Config override. Explicit agents.<id>.bootstrap.maxChars always takes precedence over both the per-file and total-budget capability defaults.
Typical result: with the default workspace files (AGENTS.md, SOUL, IDENTITY, USER, ROLE, TOOLS, HEARTBEAT, BOOT), the total bootstrap fits within 5,000 chars after per-file truncation — the proportional pass is a safety net, not the primary reducer.To override the total budget for a single agent:
agents: my-agent: bootstrap: maxChars: 8000 # overrides the small/nano per-file default (3_500) # Note: there is no explicit totalMaxChars config key — the total budget is # capability-derived (5_000 for small/nano). Override the per-file limit instead, # or use capabilityClassOverride: frontier to remove both caps entirely.
For capabilityClass in {small, nano}, the graph coordinator’s maxConcurrency defaults to 2 (down from 4). For frontier/mid, the default remains 4 (byte-identical to v2.14).Why lower concurrency for small/nano? Local inference on Ollama (or any local GPU-bound runtime) serializes model loads in practice. With 4 concurrent sub-agents, all four issue simultaneous inference requests; the GPU queues them, producing peak saturation that can cause 50–80% of sub-agents to time out on longer prompts. Lowering the default to 2 staggers the load while keeping two in-flight at all times.To override the default for your deployment:
security: agentToAgent: graphMaxConcurrency: 4 # restore the frontier default, or set any value >= 1
The operator override always takes precedence over the capability-class default.Prompt timeout for local Ollama.promptTimeoutMs is a stall budget: the deadline resets on stream activity (text/thinking deltas, throttled ~1/s) and tool completions, so it only needs to cover the longest SILENT gap — in practice the prefill before the first token, which on a loaded local GPU (e.g., qwen3.6 with multiple concurrent agents) can exceed the 180,000 ms default. Once the model streams, activity keeps the turn alive; the makespan ceiling (promptTimeoutMs × stallCeilingMultiplier, default ×10) still bounds the total turn even while streaming. For local deployments, raise the stall budget to cover slow prefill:
agents: my-agent: promptTimeout: promptTimeoutMs: 300000 # 5 minutes — covers slow local prefill for qwen3.6 under load
A served context window smaller than configured makes prefill behavior harder to predict — see Local model context window.Fallback-model recommendation. For deep multi-agent pipelines on local hardware, configure a models.failoverModel pointing to a lighter local model (e.g., a faster quantized variant) that can handle sub-agent calls when the primary model is loaded. This provides graceful degradation rather than timeout cascades.
The Bootstrap content exceeds budget threshold WARN fires when bootstrap files exceed 40% of the estimated total prompt (system prompt chars + tool schema definition chars).Old behavior (pre-Phase 160): The threshold was 85% of the system prompt character count alone. With the compact-secure prompt (~2,800 chars), this meant bootstrap content over ~2,380 chars triggered the warning — which fired on every small-model turn even with a normal workspace (false-alarm rate: ~100%).New behavior: The denominator is systemPromptChars + toolDefOverheadChars (the same formula used by executor-tool-assembly.ts for the context-budget breakdown). With a compact-secure system prompt (~2,800 chars) and 24 active tool schemas (~12,000 chars overhead), the denominator is ~14,800 chars. With the Phase 160 F2 total bootstrap budget of 5,000 chars, the ratio is ~34% — below the 40% threshold, so no warning fires under normal conditions.When does the warn still fire? If an operator adds large custom workspace files that push the total bootstrap above 5,920 chars (40% of ~14,800), the warn fires correctly — signaling that bootstrap content is crowding out the system prompt and tool schemas in a meaningful way.
All capability defaults follow the same precedence: explicit per-agent config > capability default. The D2/D3 capacity defaults are additive: all behaviors activate together for small/nano agents with no per-agent config required.
Relationship to Phase 158 D1 (reliability defaults): The GoalAnchor, rag.baseFloor, and verification critic defaults from Phase 158 follow the same precedence model — explicit per-agent config > capability default > off. The D2/D3 capacity defaults (bootstrap budget, tool ceiling, graph concurrency) are additive: all default-on behaviors activate together for small/nano agents with no per-agent config required.
For agent reliability, prefer general-purpose qwen3.6 variants over coding-tuned models:
General models (e.g., qwen3.6:35b, qwen3.6:27b) are the right choice for agentic tasks. They handle multi-turn conversations, multi-constraint instructions, and tool use reliably.
Coding-tuned models can exhibit goal fixation — continuing to pursue a sub-task at the expense of the original objective. The Comis scaffold (GoalAnchor, verification critic, compact-secure prompt) is designed to detect and correct this behavior, but general models avoid the problem in the first place. The v2.14 small-model milestone originated from a snake game incident involving a coding-tuned model; general-purpose qwen3.6 reproduces this scenario far less.
The Comis scaffold is designed for general models. It is not a substitute for choosing the right model class.
For local model runtime selection (MLX vs. GGUF), see the Environment Variables reference.
For Ollama providers, the served context window (num_ctx) may differ from the configured
contextWindow. By default, Comis probes GET /api/ps and POST /api/show at daemon boot
and reconciles:
effectiveWindow = min(configured contextWindow, served num_ctx, capability class cap)
The reconciled value is used for context budgeting and the post-turn context-window guard —
so the agent plans against the model’s actual KV-cache limit, not a stale declaration.
For model recommendations with measured receipts and the full local-deployment knob map, see
the Local models playbook.
Probed values are sanitized before use: fractional context_length values are floored to
integers, and implausibly small ones (below 512 tokens — e.g. a typo’d Modelfile
PARAMETER num_ctx) are rejected as bogus, falling back to the configured window via the
probe’s normal fail-open path.Changing the served window: The probe is read-only; it discovers what Ollama has loaded.
To serve a larger context (up to the model’s native maximum), set it on the Ollama side:
VRAM / KV-cache caveat: A larger num_ctx increases GPU memory usage proportionally
(the KV-cache scales linearly with context length). A 35B model at 256K context can OOM
or cause heavy memory thrashing on consumer hardware. Raise num_ctx only as far as
your VRAM allows; start conservatively (e.g., 32768 or 65536) and measure.To suppress the boot probe for a provider (e.g., Ollama is offline at daemon start):
Boot WARN — served below configured: When the probe discovers a served window smaller than
the configured contextWindow, Comis logs exactly one WARN per provider per boot —
"Ollama served context window below configured" — naming both numbers and the probed model
(the probe checks ONE model per provider: defaultModel, else the first models[] entry —
per-model probing is a known limitation). The hint names the fixes:
OLLAMA_CONTEXT_LENGTH=131072 ollama serve (substituting your configured window), or Modelfile
PARAMETER num_ctx 131072 (see the VRAM caveat above), and the opt-out
(providers.entries.<id>.capabilities.probeServedWindow: false). Healthy boots stay silent:
served at or above configured, equal windows, non-Ollama providers, and providers the probe
skipped all log nothing.Exhaustion provenance — the served bind names its knobs: When a turn exhausts a
served-bound window, the context_exhausted error text carries the suffix
(model contextWindow 131072 but Ollama serves only 8192 — fix: OLLAMA_CONTEXT_LENGTH=131072 ollama serve, or Modelfile 'PARAMETER num_ctx 131072')
— both Ollama knobs plus the TRUE configured window. In logs and comis explain,
rawContextWindowTokens reports the configured window with windowCapSource: "served"
(previously the served value masqueraded as the model’s declared window with source "none").
When both the served window and a capability-class cap clamp, the message names the full chain:
(model contextWindow 131072, Ollama serves 50000, capped to 32000 by contextEngine.budget.effectiveContextCapSmall — raise it (0 = uncapped) or reduce active tool schemas).
The cap wording is branched by the lever that actually binds: the
contextEngine.budget.effectiveContextCapSmall/Nano form above appears only when the budget-side
cap genuinely clamped (raising that key works); when the window was instead capped upstream by an
operator providers.entries.<id>.capabilities.capabilityClasspin (the executor’s per-class
default — small 32000 / nano 16000 — which never reads the budget keys), the suffix reads
capped to 32000 by providers.entries.<id>.capabilities.capabilityClass — pin a higher class (or remove the pin) or reduce active tool schemas
and windowCapSource reports "capabilityClass" — on that branch raising
contextEngine.budget.effectiveContextCapSmall (or setting it to 0) changes nothing, so the
error and comis explain name the pin instead of that dead knob.
The reconcile line — "Context window reconciled (served or capability cap bound)", with
source / effectiveWindow / configured / served / capabilityCap fields — logs at INFO
once per session (the first reconciled turn; a session reset grants a fresh INFO) and at
DEBUG per turn. A session whose configured window simply wins logs no reconcile line at any
level (nothing was reconciled). The served window is provider-scoped: it clamps only
executions that resolve to the provider it was probed from — a per-execution model override to
another provider (a graph node’s model: anthropic:... on an Ollama-primary agent, a subagent
spawn) keeps that model’s full window and never gets "Ollama serves only N" attribution.Boot viable floor (minViable): At boot, per agent, Comis computes
minViable = bootstrapTotalTokens + toolSchemaTokens + outputHeadroomFloor + freshTailReserve + safetyMargin
— each term single-sourced from its turn-time preflight home (the scaffold bootstrap budget,
the tool-schema overhead estimate, the output headroom at the post-downshift minimum thinking
level, the per-class preamble reserve, and the token-budget safety margin) — and WARNs when the
effective window cannot fit even that floor:
"Boot viable-floor check: effective window below minViable — agent will degrade on real turns (WARN-only, boot continues)".
The hint spells the full equation with every term’s value, e.g.
minViable = bootstrapTotalTokens(1429) + toolSchemaTokens(9714) + outputHeadroomFloor(1792) + freshTailReserve(2000) + safetyMargin(2048) = 16983 exceeds effectiveWindow 8192 [source: served],
followed by the knob for the binding window source — served: the two Ollama knobs above;
capability: pin a higher class (or remove the pin) via
providers.entries.<id>.capabilities.capabilityClass (the contextEngine.budget.* caps cannot
raise this bind); configured: providers.entries.<id>.models[].contextWindow. When tool schemas
dominate the
floor, the hint adds the active-tool-ceiling lever: pin capabilityClass (the small class
defers to a 24-tool active ceiling via discover_tools) or disable unused MCP servers /
builtin tool groups. WARN-only: Comis never refuses to boot below the floor (the adapt-down
posture) — and because the boot floor and the turn-time preflight share one source module and
one tool corpus (the boot toolSchemaTokens term measures the same converted tool
definitions — lean descriptions plus guidelines — the turn actually ships, not the raw factory
descriptions), the same numbers re-appear in any later turn-time context_exhausted for that
agent.The boot viable-floor WARN is engine-AGNOSTIC — it fires for "dag" and "pipeline" agents
alike (the minViable arithmetic holds regardless of engine); the per-turn preflight surfaces
are dag-only — see the engine-scope note in the
Context Engine section. Fleet-wide, under-served
providers surface as the config_posture:served_below_configured finding in
comis fleet and the obs.fleet.health RPC (see the
JSON-RPC reference).
Processing phase emoji reactions. When enabled, the agent reacts to messages with emoji reflecting current phase (thinking, tool use, generating, done, error).
Key
Type
Default
Description
enabled
boolean
false
Enable lifecycle reactions globally
emojiTier
enum
"unicode"
Emoji set: unicode, platform, custom
Timing (lifecycleReactions.timing)
Key
Type
Default
Description
debounceMs
number
700
Debounce before committing a phase transition
holdDoneMs
number
3000
How long to hold done emoji
holdErrorMs
number
5000
How long to hold error emoji
stallSoftMs
number
15000
Soft stall warning threshold in ms
stallHardMs
number
30000
Hard stall warning threshold in ms
Per-Channel (lifecycleReactions.perChannel.)
Key
Type
Default
Description
enabled
boolean
(unset)
Override enabled state
emojiTier
enum
(unset)
Override emoji tier
responsePrefix
Response prefix/suffix template injected into agent replies.
Key
Type
Default
Description
template
string
""
Template string (empty = disabled). Supports variables like {agent.emoji}, {model|short}.
position
enum
"prepend"
Insert position: prepend or append
deliveryTiming
Inter-block delivery pacing to simulate natural typing rhythm.
Key
Type
Default
Description
mode
enum
"natural"
Mode: off, natural, custom, adaptive
minMs
number
800
Minimum delay in ms between blocks
maxMs
number
2500
Maximum delay in ms between blocks
jitterMs
number
200
Random jitter in ms
firstBlockDelayMs
number
0
Extra delay before first block
coalescer
Block coalescer that accumulates small streaming blocks before delivery.
Key
Type
Default
Description
minChars
number
0
Blocks below this are always coalesced
maxChars
number
500
Flush threshold
idleMs
number
1500
Idle timeout in ms before flushing
codeBlockPolicy
enum
"standalone"
Code block handling: standalone or coalesce
adaptiveIdle
boolean
false
Adapt timeout to accumulated block length
senderTrustDisplay
Controls how sender identity is surfaced to the LLM in the message envelope.
Key
Type
Default
Description
enabled
boolean
false
Include sender identity in envelope
displayMode
enum
"hash"
Mode: raw (platform ID), hash (HMAC prefix), alias (operator name)
hashPrefix
number
8
Hex characters from HMAC digest (4-16)
hashSecretRef
string
""
SecretManager key for HMAC secret
aliases
Record<string, string>
{}
Sender ID to alias mapping
documentation
Documentation links injected into the system prompt so the agent can reference them.
Key
Type
Default
Description
enabled
boolean
false
Enable documentation link injection
localDocsPath
string
""
Filesystem path to local docs
publicDocsUrl
string
""
Public documentation URL
sourceUrl
string
""
Source code repository URL
communityUrl
string
""
Community or support URL
skillsMarketplaceUrl
string
""
Skills marketplace URL
mcpRegistryUrl
string
""
MCP server registry URL
customLinks
DocumentationLink[]
[]
Additional custom links (label + url)
telegramFileRefGuard
Detects hallucinated file paths in responses destined for Telegram (where file:// links are meaningless).
Key
Type
Default
Description
enabled
boolean
true
Enable file reference guard
additionalExtensions
string[]
[]
Extra file extensions to detect
excludedExtensions
string[]
[]
Extensions to exclude from detection
deliveryQueue
Crash-safe outbound delivery queue. Messages are persisted to SQLite before delivery attempts, surviving daemon restarts.
Key
Type
Default
Description
enabled
boolean
true
Enable the delivery queue. When false, messages bypass persistence.
maxQueueDepth
number
10000
Maximum entries allowed in the queue. Enqueue rejects when full.
defaultMaxAttempts
number
5
Maximum delivery attempts before marking as failed.
defaultExpireMs
number
3600000
Time-to-live in ms before an entry expires (1 hour).
drainOnStartup
boolean
true
Drain pending entries on daemon startup (crash recovery).
drainBudgetMs
number
60000
Maximum time in ms for startup drain before continuing.
pruneIntervalMs
number
300000
Interval in ms between prune sweeps for expired entries.
Observability persistence layer. Stores channel health snapshots, execution metrics, and system telemetry in SQLite for historical analysis and dashboards.
Key
Type
Default
Description
persistence.enabled
boolean
true
Enable observability persistence.
persistence.retentionDays
number
30
Days to retain data before pruning (1-365).
persistence.snapshotIntervalMs
number
300000
Interval in ms between channel health snapshots (min 60000).
Per-recall ranking trace written as bounded JSONL for “why did recall pick X?” debugging. Opt-in (default off) — unlike its cacheTrace sibling (which defaults true), the recall trace is only captured during a focused debug session.
Key
Type
Default
Description
enabled
boolean
false
Enable the recall-trace writer (opt-in). Also honors the COMIS_DISABLE_RECALL_TRACE env hard-off.
filePath
string
(optional)
Full path override. When unset, resolves to ~/.comis/logs/recall-trace.jsonl (tilde-prefix supported).
maxFileBytes
number
52428800
Per-file byte cap (50 MB; positive).
The recall trace has no raw-content opt-in — there is intentionally no includeMessages / includeSystem / includePrompt slot (unlike cacheTrace). Every payload is full-sanitized (bound, then sanitize, then redact) before it touches disk. There is no way to disable that sanitization.
The executor.broker block is wired into the daemon (AppConfigSchema → setupBroker): adding it to config.yaml starts the broker at boot — a TCP listener plus a 0600 unix socket at ~/.comis/broker.sock.
~/.comis/config.yaml
# executor.broker.bindings — provider-agnostic; presets are optional sugar# The broker starts at daemon boot whenever an executor.broker block is present.executor: broker: bindings: # Option A: built-in preset — Anthropic (header injection) - preset: anthropic secretRef: ANTHROPIC_EXECUTOR_KEY # Option B: built-in preset — Finnhub (query param injection) - preset: finnhub secretRef: FINNHUB_API_KEY # Option C: custom binding — any host, no preset required # A binding with no 'inject' defaults to Authorization: Bearer - hostRules: - pattern: { kind: exact, host: my-internal-api.example.com } inject: [] # defaults to Authorization: Bearer secretRef: INTERNAL_API_TOKEN # Option D: custom binding with explicit header injection - hostRules: - pattern: { kind: suffix, suffix: .amazonaws.com } inject: - kind: setHeader name: x-amz-security-token format: raw secretRef: AWS_SESSION_TOKEN