The plain-language version
A long conversation is a problem for two reasons:- The model has a hard limit on how much text it can read at once (its “context window”). Push past it and the call fails.
- Every token you send costs money. Sending the entire chat history on every turn gets expensive fast.
contextEngine.version defaults to "dag" (the v2.12 “Lossless Context DAG” release). DAG keeps the whole conversation losslessly instead of dropping old content, zoomably compressing the oldest history under a token budget. The simpler pipeline engine — a fixed chain of layers that drops, masks, and (as a last resort) summarizes old content before each call, lighter and fully deterministic — is the first-class opt-in: set version: "pipeline" to use it.
This page explains the pipeline engine. For the prompt-cache mechanics that keep cached reads stable across these transformations, see Cache. For the LLM-backed summarization step in detail, see Compaction.
DAG mode (the default engine)
The DAG (LCD) engine is the default engine —
contextEngine.version defaults to "dag" (set "pipeline" to opt into the simpler engine). DAG stores the full faithful conversation history losslessly (every message, tool call, and tool result preserved verbatim, paired by id), keeps a verbatim fresh tail of the most recent steps, and repairs the transcript so the model always receives a provider-valid tool_use↔tool_result pairing. This is what fixes the re-read loop that a lossy reconstruction caused. Leaf summarization, the multi-tier condensed hierarchy, and budget eviction are now part of DAG: when context utilization crosses contextThreshold (default 0.75 × the turn’s effective budget window — the reconciled context window under any capability-class cap), the engine summarizes the oldest out-of-tail chunk into a leaf summary and assembles the context under the token budget — always keeping the verbatim fresh tail, and never deleting anything from the store (compressed detail stays recoverable). When enough same-depth summaries accumulate (≥condensedMinFanout, default 4), they fold into one deeper condensed summary, forming a zoomable leaf→condensed hierarchy. Every summary is rendered honestly: it carries depth/descendant_count/time-range/trust=untrusted markers plus an “Expand for details about:” footer, the body is taint-wrapped (untrusted by role), and in dag mode the system prompt adds an uncertainty clause telling the model to treat summaries as lossy recall cues and prefer newer evidence. The in-session expansion tools (ctx_search, ctx_inspect, ctx_expand) let the agent drill back into a compressed region — full-text search over this conversation’s compressed history, inspect a summary’s coverage, and rehydrate a region to its underlying messages. They are active only in DAG mode, never-export, and distinct from cross-session recall (memory_search, session_search); see Context expansion tools. In pipeline mode, session_search searches the raw session history instead. For the condensation + honest-presentation mechanics see Compaction.Relevance-first vs recency-first history assembly. Budget eviction defaults to recency-first (the newest history steps that fit are kept). For
small/nano models on a non-caching provider, history is instead assembled relevance-first: a margin arbiter allocates the contended history budget across tiers by fused rank, while the verbatim fresh tail and the security-pinned items stay unconditional. frontier/mid models and any prompt-caching model keep recency-first and are byte-identical to prior releases (the arbiter never runs for them — reordering would break their prompt cache). The policy is capability-gated with explicit override: set contextEngine.relevance.firstByDefault to force it either way (precedence: explicit > capability default > off). See Relevance Policy.Cache-stable relevance eviction (the evictable middle band). On that same relevance-first path, the contended middle band of history (the steps that are neither the protected fresh tail nor security-pinned) is ranked by relevance rather than pure recency before it is trimmed to fit: the band is scored against the last few user turns using the in-session full-text (BM25) signal, the most-relevant steps are kept under the budget, and chronological order is restored before assembly (relevance drives only which steps survive, never their order). This recovers detail that a strict newest-first trim would have dropped on a tight window. It is cache-stable by design: on any prompt-caching profile the band is not reordered (recency is preserved so the cached prefix stays byte-identical), and it only re-ranks on a non-caching profile or an already-breaking turn. Security-pinned items are never evicted by this pass, and a tool_use/tool_result pair is always kept or dropped together. No separate config key — it follows the same contextEngine.relevance.firstByDefault + capability gate above; frontier/mid are unchanged.The ten layers (pipeline mode, the opt-in engine)
Pipeline mode is the opt-in engine (setcontextEngine.version: "pipeline"; the default is "dag"). Comis runs ten layers in fixed order before every LLM call. Layers further up the list run first; later layers see whatever the earlier ones produced.
| # | Layer | What it does |
|---|---|---|
| 1 | Thinking-block cleaner | Strips reasoning blocks from earlier turns of reasoning models so old “scratch paper” does not pile up. |
| 2 | Signature replay scrubber | Removes cached signatures when a cache-break-induced replay is happening, so the same content is not re-billed twice. |
| 3 | Signature surrogate guard | Scrubs UTF-16 surrogate halves out of thinking blocks so provider sanitization does not reject the request. |
| 4 | Reasoning-tag stripper | Removes inline reasoning XML tags from non-Anthropic providers (e.g. <thinking>…</thinking> inside the visible text). |
| 5 | History window | Keeps the last N turns (15 by default). Earlier turns are dropped wholesale before any other shaping. |
| 6 | Dead-content evictor | Removes superseded tool results — older copies of the same tool call when a newer one exists. |
| 7 | Observation masker | When the conversation crosses 120K characters, masks old tool outputs in three tiers: protected (kept), standard (content masked, metadata kept), ephemeral (removed entirely). |
| 8 | LLM compaction | When the prompt is past ~85% of the budget window, calls the model to summarize older turns into one assistant message. On the pipeline engine this budget derives from the configured contextWindow; served-window and capability caps apply on the default dag engine (see the served-window verdict). Cooldown of 5 turns prevents thrashing. See Compaction. |
| 9 | Rehydration | After compaction runs, re-injects critical references (the agent’s AGENTS.md, files referenced by ID) so the summarized agent does not lose its bearings. |
| 10 | Objective reinforcement | Always runs last. Re-injects the original objective so the agent never drifts after being summarized. |
The README has historically said “8 layers.” The shipped pipeline runs 10. Layers 1–8 are the ones that shape every turn; layers 9–10 only fire after compaction. Either way, all ten are real and instrumented.
Defaults at a glance
Circuit breaker
If any individual layer throws three times in a row inside one session, the engine disables that layer for the rest of the session and keeps going. The circuit-breaker event is logged atWARN with errorKind: "internal" and a hint that tells you which layer tripped.
Before / after
A common shape: a 60-message Telegram conversation that has been picking up tool results all morning. Without the engine, the prompt would be ~180K characters by lunchtime. With the engine: Before any layers run (raw history):Context engine mode (operator-only)
DAG is the default.contextEngine.version accepts "pipeline" and "dag"; omit it (or set "dag") for the lossless LCD engine described above, or set "pipeline" to opt into the simpler sequential-layer engine:
~/.comis/config.yaml
How it interacts with the cache
The context engine and the prompt cache are two systems that have to cooperate or you will pay for it twice.- The cache wants the prompt prefix to be byte-for-byte identical to last turn.
- The context engine wants to drop and rewrite older parts of the prompt.
When something looks wrong
Setdaemon.logLevels.agent: "debug" and look for these log lines:
| Log line | What it tells you |
|---|---|
Context engine pipeline complete | Total messages in/out, total chars in/out, which layers fired. |
Observation masker masked N messages | Layer 7 trimmed the oldest tool results. Normal once a session passes 120K chars. |
LLM compaction triggered | Layer 8 ran. Includes prunedMessages count. |
Context layer disabled by circuit breaker | One layer threw three times. Includes the layer name. Investigate; the rest of the engine keeps running. |
Related pages
Compaction
The LLM-backed summarization step (layer 8) explained in detail.
Cache
How the cache stays stable across the layers above.
Observability
Per-turn context-engine metrics and how to read them.
Agent Lifecycle
Where the engine fits in the per-message processing path.
