Skip to main content
This page covers the most common issues you might encounter with Comis. Each entry includes the exact error message you will see, what caused it, and step-by-step instructions to fix it.
Use your browser’s find (Ctrl+F / Cmd+F) to search for the error message you are seeing.

Startup Issues

Error message:
FATAL: Bootstrap failed: Config file not found
What happened: The daemon cannot find your configuration file at the expected path.How to fix:
  1. Check that the config file exists:
    ls -la ~/.comis/config.yaml
    
  2. If the file is missing, restore from the last-known-good backup:
    cp ~/.comis/config.last-good.yaml ~/.comis/config.yaml
    
  3. If using pm2, re-run setup to regenerate the ecosystem config with the correct path:
    node packages/cli/dist/cli.js pm2 setup
    
  4. If using systemd, verify COMIS_CONFIG_PATHS is set in /etc/comis/env
The daemon automatically saves a last-known-good config on every successful startup. Look for ~/.comis/config.last-good.yaml as a recovery option.
Error message:
Secrets bootstrap failed: ...
What happened: The master encryption key used to protect stored secrets is missing or invalid.How to fix:
  1. Check that SECRETS_MASTER_KEY is set in your environment or .env file:
    grep SECRETS_MASTER_KEY ~/.comis/.env
    
  2. If the key was lost, remove the secrets database to start fresh:
    rm ~/.comis/secrets.db
    
  3. Restart the daemon — it will create a new secrets database
  4. Re-add any secrets that were stored (API keys configured via the secrets system)
Error message:
Secret decryption failed: ...
What happened: The secrets database exists but cannot be decrypted, usually because the master key has changed since the secrets were stored.How to fix:
  1. If you changed the master key, restore the original key value
  2. If the original key is lost, remove the secrets database and re-create:
    rm ~/.comis/secrets.db
    
  3. Restart the daemon and re-add your secrets
Error message:
SecretRef resolution failed: ...
What happened: Your configuration references a secret (using $secret:name syntax) that does not exist in the secrets store.How to fix:
  1. Check your config.yaml for $secret: references
  2. For each reference, make sure the secret is stored — either add it via the API or replace the reference with the actual value
  3. Restart the daemon
Warning message:
Permission corrections on data directory
What happened: The daemon detected that files in ~/.comis/ have permissions that are too open. It attempted to fix them automatically.How to fix:Set restrictive permissions manually:
chmod -R 700 ~/.comis/
This ensures only your user can read, write, and access the data directory.
Error message:
EADDRINUSE: address already in use :::4766
What happened: Another process is already using port 4766, which the gateway needs.How to fix:
  1. Find what is using the port:
    lsof -i :4766
    
  2. If it is an old Comis process, stop it:
    pm2 stop comis
    
    Or:
    kill <PID from lsof output>
    
  3. If another application needs port 4766, change the Comis gateway port in config.yaml:
    gateway:
      port: 4767   # or any available port
    
  4. Restart the daemon
Error message:
Config validation failed: security: Unrecognized key(s) in object: 'oauth'
(The exact key name in the message varies: you may also see 'secrets' or a similar unrecognized-key name.)What happened: Your config.yaml contains one or more configuration keys that were removed in v1.5. The security section uses strict schema validation — any unrecognized key causes an immediate boot failure. There is no silent fallback.The three keys removed in v1.5 are:
  • oauth.storage (under security:)
  • security.secrets.enabled
  • COMIS_DISABLE_ENCRYPTED_SECRETS (in ~/.comis/.env)
How to fix:
  1. Detect legacy keys in your config and env:
    grep -n "oauth\.storage\|secrets\.enabled" ~/.comis/config.yaml
    grep -n "COMIS_DISABLE_ENCRYPTED_SECRETS" ~/.comis/.env
    
  2. Remove each legacy key from the file it appears in:
    • Delete the oauth.storage: line under security: in config.yaml
    • Delete the security.secrets.enabled: line from config.yaml
    • Delete the COMIS_DISABLE_ENCRYPTED_SECRETS= line from .env
  3. Add the replacement key security.storage (if not already present):
    ~/.comis/config.yaml
    security:
      storage: encrypted   # encrypted (default) | file | env
    
  4. Verify boot:
    node packages/cli/dist/cli.js secrets list
    
For a full migration walkthrough, see Secrets — Migrating from Pre-v1.5 Configuration.
Error message (in logs):
Error [ERR_ACCESS_DENIED]: Access to this API has been restricted
or
Error [ERR_ACCESS_DENIED]: ... is disabled when permission model is enabled
What happened: The daemon is running under node --permission (the Node.js permission model), which categorically disables fd-based fs APIs (fsync, fchmod, fchown) at the process level. On pre-guard releases, calling fsyncSync on the data-dir lock file during boot threw ERR_ACCESS_DENIED before any log line was emitted, causing every start attempt to fail.This behavior is Linux-only and occurs only when security.permission.enableNodePermissions: true is set or the systemd unit passes --permission in ExecStart.Is this still a problem? Current Comis versions guard all fd-API call sites via isFsyncDisabledByPermissionModel — the daemon will not crash on this. If you are seeing this error, you may be running an older release or a custom Node.js build that raises ERR_ACCESS_DENIED for a different reason.How to fix:
  1. Check your Comis version:
    node packages/cli/dist/cli.js --version
    
    Ensure you are on v1.4 or later (the guard was introduced in v1.4).
  2. Check if —permission is active:
    ps aux | grep 'node.*--permission'
    
    If you did not intend to enable the permission model, remove security.permission.enableNodePermissions: true from config.yaml and remove --permission from your systemd unit.
  3. If you want to keep —permission: Upgrade to v1.4 or later. The daemon handles the fd-API disablement gracefully (credential writes are best-effort durability; this is expected behavior).
For full detail on the fd-API impact, see Node Permissions — Production fd-API Disablement.

Channel Issues

Error message:
Telegram enabled but no bot token configured
Hint from the daemon: Set botToken in channels.telegram config or TELEGRAM_BOT_TOKEN env varHow to fix:
  1. Get a bot token from @BotFather on Telegram
  2. Add it to your config:
    channels:
      telegram:
        enabled: true
        botToken: "your-bot-token"
    
    Or set the environment variable TELEGRAM_BOT_TOKEN
  3. Restart the daemon
Error message:
Discord enabled but no bot token configured
Hint from the daemon: Set botToken in channels.discord config or DISCORD_BOT_TOKEN env varHow to fix:
  1. Get a bot token from the Discord Developer Portal
  2. Add it to your config:
    channels:
      discord:
        enabled: true
        botToken: "your-bot-token"
    
    Or set the environment variable DISCORD_BOT_TOKEN
  3. Make sure you have enabled the Message Content Intent in the Discord Developer Portal under Bot settings
  4. Restart the daemon
Error message:
Discord credential validation failed
Hint from the daemon: Verify DISCORD_BOT_TOKEN is valid in Discord Developer PortalHow to fix:
  1. Go to the Discord Developer Portal
  2. Select your application and navigate to Bot
  3. Verify the token matches what is in your config
  4. Check that the Message Content Intent is enabled (under Privileged Gateway Intents)
  5. If the token was reset, copy the new token and update your config
  6. Restart the daemon
Error message:
Slack credential validation failed
How to fix:
  1. Go to your Slack API dashboard
  2. Verify the Bot User OAuth Token matches your config
  3. Check that the Signing Secret is correct
  4. Make sure the bot has the required scopes (chat:write, channels:history, etc.)
  5. Restart the daemon
Error messages:
WhatsApp credential validation failed
Signal connection validation failed
How to fix for WhatsApp:
  1. Verify your WhatsApp Business API credentials
  2. Check that the phone number ID and access token are correct
  3. Make sure the WhatsApp Business account is active
How to fix for Signal:
  1. Check that signal-cli is installed and running on your server
  2. Verify the Signal phone number is registered with signal-cli
  3. Test signal-cli independently:
    signal-cli -u +1234567890 receive
    
  4. Restart the daemon

Runtime Issues

Warning message:
Gateway token auto-generated (ephemeral -- will be lost on restart)
Hint from the daemon: Set GATEWAY_TOKEN_<ID> in environment or secrets store for persistenceWhat happened: No gateway token was configured, so the daemon created a temporary one. This token will be different after every restart, meaning you will need to re-authenticate each time.How to fix:Set a permanent token in your config.yaml:
gateway:
  tokens:
    - id: "admin"
      secret: "your-secure-token-minimum-32-characters-long"
      scopes: ["rpc", "ws", "admin"]
The token secret must be at least 32 characters long.
Error message:
Shutdown timeout exceeded, forcing exit
What happened: The daemon could not stop all components cleanly within the 30-second timeout. This can happen if an agent is in the middle of a long execution or a channel connection is hanging.How to fix:
  1. Check the logs before the timeout to see which component was still running
  2. If this happens frequently, increase the timeout:
    daemon:
      shutdownTimeoutMs: 60000   # 60 seconds instead of 30
    
  3. If a specific component is always slow, investigate that component (check channel connections, agent executions in progress)
Warning message:
Unhandled promise rejection (non-fatal)
What happened: An unexpected error occurred in an asynchronous operation, but the daemon caught it and continued running. This does not affect normal operation but may indicate a bug in a tool or skill.How to fix:
  1. Check the error details in the log line — it includes the rejection reason
  2. If it mentions a specific tool or skill, check that tool’s configuration
  3. If it persists, this may be a bug — check for updates or report the issue

Configuration Issues

Error message:
Config patch rate limit exceeded
What happened: Too many configuration changes were sent in rapid succession via the RPC API. The daemon rate-limits config patches to prevent accidental flooding.How to fix:
  1. Wait a moment and retry the config change
  2. If you need to make many changes at once, batch them into a single config.apply call instead of multiple config.patch calls
Warning message about approvals configuration.What happened: The approvals section in your configuration has an issue — typically a referenced approval policy that does not match any defined policy.How to fix:
  1. Review the approvals section in your config.yaml
  2. Make sure all referenced policy names match defined policies
  3. Restart the daemon
No specific error message — the daemon shows "Comis daemon started" but agents ignore incoming messages.How to fix:
  1. Check agent routing — make sure each agent’s bindings match the channels you are messaging from:
    agents:
      default:
        bindings:
          - channel: telegram
    
  2. Check agent status — the agent might be suspended. Check in the web dashboard or logs.
  3. Check budget — the agent’s token or cost budget might be exhausted. See Agent Safety.
  4. Check model provider — verify the API key for the configured model provider is valid
  5. Check logs — set daemon.logLevels.agent: "debug" to see detailed agent processing logs

Resilience Issues

See Resilience Architecture for how these systems work together.
Log event:
execution:prompt_timeout
What happened: An LLM call exceeded its prompt-level deadline — either the stall budget (no stream/tool activity for promptTimeoutMs) or the makespan ceiling (promptTimeoutMs × stallCeilingMultiplier, a streaming runaway). The agent automatically retried with a shorter timeout or fell back to an alternate model. comis explain <sessionKey> names which limit fired and the binding knob.How to fix:
  1. Check the provider’s status page for ongoing outages
  2. If the model legitimately needs more silent time (e.g., slow local prefill before the first token), increase promptTimeoutMs:
    agents:
      default:
        promptTimeout:
          promptTimeoutMs: 300000   # 5 minutes
    
  3. Review modelFailover.fallbackModels for the agent to ensure fallback models are configured
Log message:
Sub-agent watchdog timeout
What happened: A sub-agent ran longer than maxRunTimeoutMs. The watchdog force-failed the run and sent a failure notification to the channel.How to fix:
  1. Check the sub-agent task — was it genuinely too complex, or did the LLM provider hang?
  2. If the task is legitimately long, increase maxRunTimeoutMs:
    security:
      agentToAgent:
        subagentContext:
          maxRunTimeoutMs: 900000   # 15 minutes
    
  3. Check for provider slowness via provider:degraded events in logs
Log event:
provider:degraded
What happened: Multiple agents hit failures within a short window. The provider health monitor flagged the provider as degraded. LLM-dependent operations are being skipped.How to fix:
  1. Check the provider’s status page for ongoing outages
  2. Wait for automatic recovery — the system emits a provider:recovered event when the provider comes back
  3. Check the dead-letter queue for missed announcements (~/.comis/dead-letters.jsonl)
Log message:
Ghost run detected and force-failed
What happened: A sub-agent was still marked “running” past the grace period (maxRunTimeoutMs + 120 seconds). The ghost sweep force-failed it and delivered a failure notification to the channel.How to fix:
  1. This usually indicates a process crash during execution
  2. Check for out-of-memory (OOM) or unhandled errors around the same time in logs
  3. The failure notification was already delivered to the channel — no action needed for the user
Log event:
announcement:dead_lettered
What happened: Announcement delivery failed and entries were saved to the dead-letter queue (~/.comis/dead-letters.jsonl). Retry will be attempted up to 5 times.How to fix:
  1. Check channel connectivity — is the bot still connected to the channel?
  2. Verify the bot token is still valid
  3. Entries auto-expire after 1 hour
  4. On provider recovery the dead-letter queue drains automatically
Log event:
execution:aborted  reason=pipeline_timeout
What happened: The entire agent execution exceeded the 600-second pipeline timeout. A static error message was sent to the user (no LLM call).How to fix:
  1. Check if the model was slow — look for execution:prompt_timeout events before this
  2. This is a backstop — if it fires regularly, investigate per-call prompt timeouts first
  3. Consider adding fallback models via modelFailover.fallbackModels in the agent config

Memory & Recall Issues

Recall is the path that selects which memories enter the prompt. When it surprises you, you do not have to read code — two questions cover almost every case. (The artifacts and events behind this runbook are documented in Observability → Memory & Recall Diagnostics.)
Symptom: A recall injected a memory you did not expect, or omitted one you did — and you want to see the ranking that produced that result.How to diagnose:
  1. Enable the recall trace (opt-in, default off). Add to config.yaml, then reload:
    diagnostics:
      recallTrace:
        enabled: true        # opt-in — redacted, bounded (50 MB) JSONL
    
  2. Reproduce the surprising recall (send the message that triggered it).
  3. Read the per-memory ranking breakdown with --format json:
    node packages/cli/dist/cli.js memory recall-trace <session> --format json
    
    The JSON record shows, for each candidate: which lanes matched (FTS / vector / entity), the fused order, the pre- and post-rerank scores, the recency / temporal / proof / trust score components, and the include/exclude reason for every memory. The table view (omit --format json) shows just the correlation keys (trace, session, finalCount, ts) for locating the right record first.
  4. Cross-reference the trust filter. If the memory you expected is excluded, the reason field will say why — most often a trust-level filter (includeTrustLevels) rather than a low score.
The recall-trace payload is full-sanitized — it records the ranking structure and scores, never raw memory bodies or query text. There is no toggle to include raw content; see Observability → Memory & Recall Diagnostics.
Turn recallTrace.enabled back to false once you have the answer — it is a debug-session writer, not a steady-state one.
Symptom: Recall adds noticeable latency to a turn.How to diagnose, in order:
  1. Is the cross-encoder reranker on? rag.rerank.enabled is opt-in (default false) precisely because the reranker adds latency. If you enabled it, that is the most likely source:
    agents:
      default:
        rag:
          rerank:
            enabled: true       # the latency source — disable to A/B
            timeoutMs: 800       # the per-recall budget; fallback fires past this
    
    Disable it briefly to confirm the latency disappears. On timeout the reranker already falls back to the fusion-ranked order (memory:reranked.timedOut: true), so correctness is preserved — but the wait still happens up to timeoutMs.
  2. First-call reranker model load. The reranker is a local GGUF loaded on first use. The first recall after a restart (with rerank on) pays a one-time model-load cost; steady-state recalls do not. A single slow recall right after startup is expected.
  3. Vector index availability. If the vector lane cannot run, recall falls back to FTS-only and logs a WARN with errorKind + hint. Check for it:
    grep '"errorKind"' ~/.comis/logs/daemon.log | grep -i "rerank\|vec\|embed"
    
    A vec→FTS fallback shows as memory:recalled.vectorCandidates: 0.
Aggregate health: the comis memory stats recall-counter overlay surfaces the rerank-fallback rate, lane usage, consolidation throughput, and recall hit-rate — a climbing fallback rate means the reranker is timing out across the fleet, not just on one turn:
node packages/cli/dist/cli.js memory stats
The overlay is best-effort; a daemon that has not wired the counters still renders base stats. See the CLI reference for every comis memory subcommand.

Credential Broker

The credential broker intercepts HTTPS requests from driven-CLI spawns, injects the real API key, and forwards the request upstream — all without the key ever entering the sandbox. When something goes wrong, the broker emits a broker:denied or broker:credential_unavailable event and returns a specific HTTP status code. Use the playbooks below to diagnose each failure mode.
What happened: The driven-CLI’s proxy token was missing, forged, or already consumed. Each token is single-use — one is issued per driven-CLI spawn. The broker emits broker:denied with reason: "bad_token".Diagnose:
grep 'broker:denied' ~/.comis/logs/daemon.log | grep bad_token | tail -20
Common causes:
  • The CLI was not launched through the broker (HTTPS_PROXY env var not set to the broker socket).
  • The session was torn down before the request completed.
  • A token was replayed (single-use invariant — each token can only be consumed once).
Resolution: Ensure the CLI is launched via the daemon’s driven-CLI spawn path, not invoked manually without the broker environment.
What happened: Two broker:denied reasons produce 403:
  • no_binding — the requested host has no matching hostRules entry in any binding. Add a binding or use a built-in preset (anthropic, finnhub).
  • path_policy — the host is allow-listed but the path is blocked by the binding’s pathPolicy glob.
Diagnose:
grep 'broker:denied' ~/.comis/logs/daemon.log | grep -E '"reason":"(no_binding|path_policy)"' | tail -10
Resolution for no_binding: Add a hostRules entry for the target host in executor.broker.bindings, or use a matching preset.Resolution for path_policy: Review the pathPolicy glob on the binding. Ensure the requested path matches the allowed pattern (e.g., /v1/*).
What happened: The broker could not resolve the secretRef from SecretManager. The broker emits broker:credential_unavailable and returns 502. The request is never forwarded upstream.Diagnose:
grep 'broker:credential_unavailable' ~/.comis/logs/daemon.log | tail -10
# Check that the secret exists:
node packages/cli/dist/cli.js secrets list
Resolution: Ensure the secretRef in the binding exactly matches the key name returned by secrets list. If the secret is missing, add it:
node packages/cli/dist/cli.js secrets set YOUR_SECRET_KEY
What happened: A WebSocket upgrade from the credentialed sandbox returned 501 with broker:denied reason ws_upgrade_not_supported. This is an intentional fail-closed guard — WebSocket credential injection is not supported in this release. The error is returned explicitly rather than hanging silently.Resolution: Use HTTPS (not WebSocket) for API calls that require credential injection. WS credential injection is on the roadmap.
Every broker log line carries step (pipeline stage), traceId, and agentId. To trace a full request:
# Find the traceId from a session_opened event for your agent:
grep 'broker:session_opened' ~/.comis/logs/daemon.log | grep '"agentId":"<your-agent>"' | tail -5

# Follow all events for that trace:
grep '"traceId":"<your-trace-id>"' ~/.comis/logs/daemon.log | jq -c '{step, event: (.event // .type), reason: .reason}'
Expected sequence for a successful injection:broker:session_openedbroker:requestbroker:injected → (upstream response) → broker:session_closedIf broker:denied appears in place of broker:injected, the reason field identifies the failure mode (see accordions above).Filter by pipeline stage:
jq 'select(.step == "broker-inject")' ~/.comis/logs/daemon.log
What happened: The broker injected a credential and forwarded the request, but the upstream API returned 401. The secret value may be wrong or the injection rule kind may not match what the API expects.Diagnose:
  1. Verify the secretRef key in the binding matches the stored secret name (secrets list).
  2. Check the stored secret value — update it with:
    node packages/cli/dist/cli.js secrets set YOUR_SECRET_KEY
    
  3. Check the injection rule kind — the anthropic preset injects x-api-key (raw, not Bearer). For OpenAI-compatible APIs use a custom binding with format: bearer.
  4. Look for broker:injected in logs to confirm injection actually occurred (vs. a 403 or 502 before it reached upstream).
Full Credential Broker reference →

Daemon

Startup, shutdown, and recovery

Logging

How to view and understand logs

FAQ

Common questions about running Comis

Verification

Diagnostic steps for a new installation