Context Management

What it does. Every time your agent is about to call its AI provider, Comis reshapes the conversation so the most relevant parts fit inside the model’s context window — without the agent forgetting the task or paying for tokens it does not need. Who it is for. Anyone whose agent runs for more than a few exchanges. If your agent answers one question and goes silent, you can ignore this page. If it lives in a chat channel, schedules its own work, or keeps coming back to the same project for days, this is the system that keeps it sharp.

The plain-language version

A long conversation is a problem for two reasons:

The model has a hard limit on how much text it can read at once (its “context window”). Push past it and the call fails.
Every token you send costs money. Sending the entire chat history on every turn gets expensive fast.

Comis ships the lossless DAG (LCD) context engine as the default — contextEngine.version defaults to "dag" (the v2.12 “Lossless Context DAG” release). DAG keeps the whole conversation losslessly instead of dropping old content, zoomably compressing the oldest history under a token budget. The simpler pipeline engine — a fixed chain of layers that drops, masks, and (as a last resort) summarizes old content before each call, lighter and fully deterministic — is the first-class opt-in: set version: "pipeline" to use it. This page explains the pipeline engine. For the prompt-cache mechanics that keep cached reads stable across these transformations, see Cache. For the LLM-backed summarization step in detail, see Compaction.

DAG mode (the default engine)

The DAG (LCD) engine is the default engine — contextEngine.version defaults to "dag" (set "pipeline" to opt into the simpler engine). DAG stores the full faithful conversation history losslessly (every message, tool call, and tool result preserved verbatim, paired by id), keeps a verbatim fresh tail of the most recent steps, and repairs the transcript so the model always receives a provider-valid tool_use↔tool_result pairing. This is what fixes the re-read loop that a lossy reconstruction caused. Leaf summarization, the multi-tier condensed hierarchy, and budget eviction are now part of DAG: when context utilization crosses contextThreshold (default 0.75 × the turn’s effective budget window — the reconciled context window under any capability-class cap), the engine summarizes the oldest out-of-tail chunk into a leaf summary and assembles the context under the token budget — always keeping the verbatim fresh tail, and never deleting anything from the store (compressed detail stays recoverable). When enough same-depth summaries accumulate (≥condensedMinFanout, default 4), they fold into one deeper condensed summary, forming a zoomable leaf→condensed hierarchy. Every summary is rendered honestly: it carries depth/descendant_count/time-range/trust=untrusted markers plus an “Expand for details about:” footer, the body is taint-wrapped (untrusted by role), and in dag mode the system prompt adds an uncertainty clause telling the model to treat summaries as lossy recall cues and prefer newer evidence. The in-session expansion tools (ctx_search, ctx_inspect, ctx_expand) let the agent drill back into a compressed region — full-text search over this conversation’s compressed history, inspect a summary’s coverage, and rehydrate a region to its underlying messages. They are active only in DAG mode, never-export, and distinct from cross-session recall (memory_search, session_search); see Context expansion tools. In pipeline mode, session_search searches the raw session history instead. For the condensation + honest-presentation mechanics see Compaction.

Relevance-first vs recency-first history assembly. Budget eviction defaults to recency-first (the newest history steps that fit are kept). For small/nano models on a non-caching provider, history is instead assembled relevance-first: a margin arbiter allocates the contended history budget across tiers by fused rank, while the verbatim fresh tail and the security-pinned items stay unconditional. frontier/mid models and any prompt-caching model keep recency-first and are byte-identical to prior releases (the arbiter never runs for them — reordering would break their prompt cache). The policy is capability-gated with explicit override: set contextEngine.relevance.firstByDefault to force it either way (precedence: explicit > capability default > off). See Relevance Policy.Cache-stable relevance eviction (the evictable middle band). On that same relevance-first path, the contended middle band of history (the steps that are neither the protected fresh tail nor security-pinned) is ranked by relevance rather than pure recency before it is trimmed to fit: the band is scored against the last few user turns using the in-session full-text (BM25) signal, the most-relevant steps are kept under the budget, and chronological order is restored before assembly (relevance drives only which steps survive, never their order). This recovers detail that a strict newest-first trim would have dropped on a tight window. It is cache-stable by design: on any prompt-caching profile the band is not reordered (recency is preserved so the cached prefix stays byte-identical), and it only re-ranks on a non-caching profile or an already-breaking turn. Security-pinned items are never evicted by this pass, and a tool_use/tool_result pair is always kept or dropped together. No separate config key — it follows the same contextEngine.relevance.firstByDefault + capability gate above; frontier/mid are unchanged.

The ten layers (pipeline mode, the opt-in engine)

Pipeline mode is the opt-in engine (set contextEngine.version: "pipeline"; the default is "dag"). Comis runs ten layers in fixed order before every LLM call. Layers further up the list run first; later layers see whatever the earlier ones produced.

#	Layer	What it does
1	Thinking-block cleaner	Strips reasoning blocks from earlier turns of reasoning models so old “scratch paper” does not pile up.
2	Signature replay scrubber	Removes cached signatures when a cache-break-induced replay is happening, so the same content is not re-billed twice.
3	Signature surrogate guard	Scrubs UTF-16 surrogate halves out of thinking blocks so provider sanitization does not reject the request.
4	Reasoning-tag stripper	Removes inline reasoning XML tags from non-Anthropic providers (e.g. `<thinking>…</thinking>` inside the visible text).
5	History window	Keeps the last N turns (15 by default). Earlier turns are dropped wholesale before any other shaping.
6	Dead-content evictor	Removes superseded tool results — older copies of the same tool call when a newer one exists.
7	Observation masker	When the conversation crosses 120K characters, masks old tool outputs in three tiers: protected (kept), standard (content masked, metadata kept), ephemeral (removed entirely).
8	LLM compaction	When the prompt is past ~85% of the budget window, calls the model to summarize older turns into one assistant message. On the pipeline engine this budget derives from the configured `contextWindow`; served-window and capability caps apply on the default dag engine (see the served-window verdict). Cooldown of 5 turns prevents thrashing. See Compaction.
9	Rehydration	After compaction runs, re-injects critical references (the agent’s `AGENTS.md`, files referenced by ID) so the summarized agent does not lose its bearings.
10	Objective reinforcement	Always runs last. Re-injects the original objective so the agent never drifts after being summarized.

The README has historically said “8 layers.” The shipped pipeline runs 10. Layers 1–8 are the ones that shape every turn; layers 9–10 only fire after compaction. Either way, all ten are real and instrumented.

Defaults at a glance

agents:
  default:
    contextEngine:
      enabled: true
      version: pipeline       # opt-in — the default is "dag", the lossless LCD engine
      thinkingKeepTurns: 10
      historyTurns: 15        # recent user turns kept in full
      evictionMinAge: 15
      observationKeepWindow: 25
      observationTriggerChars: 120000
      ephemeralKeepWindow: 10
      compactionCooldownTurns: 5
      compactionPrefixAnchorTurns: 2

You rarely need to change these. The full list is on the config-yaml reference.

Circuit breaker

If any individual layer throws three times in a row inside one session, the engine disables that layer for the rest of the session and keeps going. The circuit-breaker event is logged at WARN with errorKind: "internal" and a hint that tells you which layer tripped.

Before / after

A common shape: a 60-message Telegram conversation that has been picking up tool results all morning. Without the engine, the prompt would be ~180K characters by lunchtime. With the engine: Before any layers run (raw history):

[60 messages, ~180,000 characters]
- 12 user messages
- 12 agent replies
- 36 tool-call/tool-result pairs (file reads, web searches, command output)
- 8 thinking blocks from earlier reasoning passes

After layers 1-7 run (no compaction needed yet):

[~28 messages, ~95,000 characters]
- Last 15 turns kept whole
- 8 old thinking blocks dropped (layer 1)
- 6 superseded tool results dropped (layer 6)
- Older tool results before turn -25 masked to "[MASKED_TOOL_RESULT]" (layer 7)
- Original objective re-injected (layer 10)

After layers 1-10 run (long-running, compaction triggered):

[~12 messages, ~45,000 characters]
- LLM summary of turns 1-50 as one assistant message (layer 8)
- AGENTS.md and file references re-injected after the summary (layer 9)
- Last 10 turns kept verbatim
- Original objective restated at the end (layer 10)

The agent reading the third version still knows what its job is, still has the file references it needs, and the prompt fits comfortably inside Sonnet’s 200K window with room for the response.

Context engine mode (operator-only)

DAG is the default. contextEngine.version accepts "pipeline" and "dag"; omit it (or set "dag") for the lossless LCD engine described above, or set "pipeline" to opt into the simpler sequential-layer engine:

~/.comis/config.yaml

agents:
  default:
    contextEngine:
      version: pipeline   # opt into the simpler engine; omit (or "dag") for the default lossless LCD engine

contextEngine.version is operator-only. It is immutable to the agent’s config.patch RPC — an agent cannot switch its own engine mode. This is a deliberate security boundary: because the engine mode governs which recall/context tools an agent is exposed to, letting an agent self-switch the engine would let it change its own tool exposure. Only an operator editing the config (the path above) can change the engine.

How it interacts with the cache

The context engine and the prompt cache are two systems that have to cooperate or you will pay for it twice.

The cache wants the prompt prefix to be byte-for-byte identical to last turn.
The context engine wants to drop and rewrite older parts of the prompt.

The compromise: the engine tracks a cache fence index. Layers 1-10 never modify content before the fence on a turn — they only edit content after it. That means the cached prefix stays stable across turns, and the parts of the prompt that change live entirely after the fence where the provider expects them to vary. For the full mechanics — adaptive TTL, sub-agent spawn staggering, two-phase break detection, the 15+ shipped optimizations — see Cache.

When something looks wrong

Set daemon.logLevels.agent: "debug" and look for these log lines:

Log line	What it tells you
`Context engine pipeline complete`	Total messages in/out, total chars in/out, which layers fired.
`Observation masker masked N messages`	Layer 7 trimmed the oldest tool results. Normal once a session passes 120K chars.
`LLM compaction triggered`	Layer 8 ran. Includes `prunedMessages` count.
`Context layer disabled by circuit breaker`	One layer threw three times. Includes the layer name. Investigate; the rest of the engine keeps running.

The same metrics show up live in the dashboard at Observe → Context (the Context Engine View).

Compaction

The LLM-backed summarization step (layer 8) explained in detail.

Cache

How the cache stays stable across the layers above.

Observability

Per-turn context-engine metrics and how to read them.

Agent Lifecycle

Where the engine fits in the per-message processing path.

​The plain-language version

​DAG mode (the default engine)

​The ten layers (pipeline mode, the opt-in engine)

​Defaults at a glance

​Circuit breaker

​Before / after

​Context engine mode (operator-only)

​How it interacts with the cache

​When something looks wrong

​Related pages

Compaction

Cache

Observability

Agent Lifecycle

The plain-language version

DAG mode (the default engine)

The ten layers (pipeline mode, the opt-in engine)

Defaults at a glance

Circuit breaker

Before / after

Context engine mode (operator-only)

How it interacts with the cache

When something looks wrong

Related pages