auto provider
works without any setup; this page is mainly for tuning, switching to local,
or controlling cost.
Embeddings are what power the “meaning matching” side of memory search. They
convert text into a numerical representation that lets Comis find memories based
on meaning, not just matching words. You do not need to understand the math —
just choose a provider and Comis handles the rest.
What is wired today
Two embedding implementations ship inpackages/memory:
- OpenAI embeddings (
embedding-provider-openai.ts) — calls thetext-embedding-3-smallmodel by default (1536 dimensions). - Local embeddings (
embedding-provider-local.ts) — runs a small transformer model entirely on-device using a downloaded GGUF file.
packages/memory/src/embedding-provider-factory.ts chooses
between them based on the embedding.provider setting. The default
"auto" mode tries local first and falls back to OpenAI if loading fails.
Why embeddings matter
Without embeddings, memory search only finds exact word matches. With embeddings, your agent can find related memories even when different words are used. For example:- A memory about “cooking pasta” would match a search for “making spaghetti”
- A memory about “budget planning” would match a search for “financial forecasting”
- A memory about “fixing the deployment” would match a search for “resolving the release issue”
Embedder vs reranker
Comis runs two different local GGUF models (both vianode-llama-cpp) at two
different stages of recall — it is easy to confuse them, so the distinction
matters:
| Embedder | Reranker | |
|---|---|---|
| Model type | Bi-encoder | Cross-encoder |
| Example model | nomic-embed (local) / text-embedding-3-small (OpenAI) | bge-reranker-v2-m3 (local GGUF) |
| Config home | embedding.* (e.g. embedding.local.modelUri) | memory.rerankerModel + rag.rerank.* |
| What it does | Turns each memory into a vector once, ahead of time, feeding the vector search lane | Re-scores already-retrieved candidates by reading the query and candidate together |
| When it runs | At store/index time (and to embed the query) | After fusion, only on the top candidates |
| Default | On (the auto provider) | Opt-in (rag.rerank.enabled defaults false) |
Choosing a provider
Comis supports three embedding provider options: auto (default) — Tries local first, then falls back to remote. This is the best option for most users because it works without any setup. Comis automatically downloads a local model and uses it. If the local model fails to load, it falls back to the OpenAI API. local — Runs entirely on your machine using a downloaded model. This option is free, private (no data leaves your machine), and works offline. It requires approximately 500MB of disk space for the model and uses some RAM while running. GPU acceleration is used automatically when available. openai — Uses OpenAI’s embedding API. This option is fast and accurate but costs a small amount per search (fractions of a cent) and requires an OpenAI API key. Choose this if you want the highest quality embeddings or if your machine does not have enough resources for the local model.Configuration
| Option | Type | Default | What it does |
|---|---|---|---|
embedding.enabled | boolean | true | Enable embedding generation for meaning-based search |
embedding.provider | string | "auto" | Provider: auto, local, or openai |
embedding.local.gpu | string | "auto" | GPU acceleration: auto, metal, cuda, vulkan, or false |
embedding.local.contextSize | number | 2048 | Maximum text length the local model can process (in tokens) |
embedding.openai.model | string | "text-embedding-3-small" | OpenAI embedding model to use |
embedding.openai.dimensions | number | 1536 | Vector dimensions for the OpenAI model |
embedding.cache.maxEntries | number | 10000 | Maximum cached embeddings in L1 in-memory cache (0 = disabled) |
embedding.cache.persistent | boolean | false | Enable persistent L2 SQLite cache for embeddings that survive restarts |
embedding.cache.persistentMaxEntries | number | 50000 | Maximum entries in L2 persistent cache |
embedding.cache.ttlMs | number | (none) | TTL in milliseconds for cache entries. When unset, LRU eviction only |
embedding.cache.pruneIntervalMs | number | 300000 | How often to check for expired or excess cache entries (5 min default) |
embedding.batch.batchSize | number | 100 | Number of memories processed per batch during indexing |
embedding.batch.indexOnStartup | boolean | true | Index any un-embedded memories when Comis starts |
embedding.autoReindex | boolean | true | Automatically re-index all memories when the provider changes |
Local provider example
~/.comis/config.yaml
OpenAI provider example
~/.comis/config.yaml
Persistent caching
Comis uses a two-tier embedding cache to avoid redundant API calls:- L1 (in-memory LRU) — Fast access for recent embeddings. Controlled by
embedding.cache.maxEntries. Lost on daemon restart. - L2 (SQLite-backed persistent) — Durable storage that survives daemon restarts. Embeddings computed once are reused across sessions indefinitely.
~/.comis/config.yaml
If you change your embedding provider (for example, switching from local to
OpenAI), Comis automatically re-indexes all existing memories to use the new
model. This happens in the background and does not interrupt normal operation.
You can disable this with
autoReindex: false if you prefer to manage
re-indexing manually.Re-embedding strategy
Switching providers (or models) means existing embeddings no longer match new ones — they live in a different vector space. Comis tracks the embedding fingerprint (provider + model + dimensions) and detects mismatches at startup. Whenembedding.autoReindex: true (the default), a background batch indexer
re-embeds every existing memory in batches of embedding.batch.batchSize
(default 100) until the database is consistent again. The agent stays
responsive throughout — re-indexing runs out of band, and search keeps using
whatever embeddings are valid at any given moment.
The fingerprint manager
(packages/memory/src/embedding-fingerprint.ts) and batch indexer
(packages/memory/src/embedding-batch-indexer.ts) handle this automatically.
Cost note
OpenAI’stext-embedding-3-small is cheap: roughly $0.02 per 1M tokens
of input. A typical memory entry is 50–200 tokens, so embedding 10,000
memories costs around $0.02–$0.04 total — and once cached (see
“Persistent caching” above), re-indexing is free.
The local provider has zero per-call cost but uses ~500 MB of disk and some
RAM while running.
Disabling embeddings
If you do not want to use meaning-based search at all, setembedding.enabled
to false. Memory search will still work using text matching only. This is
useful if you want to minimize resource usage or if your agent’s memory is small
enough that text matching is sufficient.
Search
How text matching and meaning matching work together.
Memory
The different types of memories your agent stores.
