Embeddings - Comis

What it does. Turns each memory into a numerical “fingerprint” so Comis can find related memories even when the wording is different. Who it is for. Anyone who wants better recall. The default auto provider works without any setup; this page is mainly for tuning, switching to local, or controlling cost. Embeddings are what power the “meaning matching” side of memory search. They convert text into a numerical representation that lets Comis find memories based on meaning, not just matching words. You do not need to understand the math — just choose a provider and Comis handles the rest.

What is wired today

Two embedding implementations ship in packages/memory:

OpenAI embeddings (embedding-provider-openai.ts) — calls the text-embedding-3-small model by default (1536 dimensions).
Local embeddings (embedding-provider-local.ts) — runs a small transformer model entirely on-device using a downloaded GGUF file.

The factory at packages/memory/src/embedding-provider-factory.ts chooses between them based on the embedding.provider setting. The default "auto" mode tries local first and falls back to OpenAI if loading fails.

Why embeddings matter

Without embeddings, memory search only finds exact word matches. With embeddings, your agent can find related memories even when different words are used. For example:

A memory about “cooking pasta” would match a search for “making spaghetti”
A memory about “budget planning” would match a search for “financial forecasting”
A memory about “fixing the deployment” would match a search for “resolving the release issue”

This makes your agent significantly better at recalling relevant information, especially over long periods where the same topics might be discussed using different terminology.

Embedder vs reranker

Comis runs two different local GGUF models (both via node-llama-cpp) at two different stages of recall — it is easy to confuse them, so the distinction matters:

	Embedder	Reranker
Model type	Bi-encoder	Cross-encoder
Example model	`nomic-embed` (local) / `text-embedding-3-small` (OpenAI)	`bge-reranker-v2-m3` (local GGUF)
Config home	`embedding.*` (e.g. `embedding.local.modelUri`)	`memory.rerankerModel` + `rag.rerank.*`
What it does	Turns each memory into a vector once, ahead of time, feeding the vector search lane	Re-scores already-retrieved candidates by reading the query and candidate together
When it runs	At store/index time (and to embed the query)	After fusion, only on the top candidates
Default	On (the `auto` provider)	Opt-in (`rag.rerank.enabled` defaults `false`)

The embedder is a bi-encoder: it encodes the query and each memory separately into vectors, and similarity is the distance between them — fast, and what powers the meaning-matching search lane. The reranker is a cross-encoder: it processes the query and a candidate jointly for a sharper relevance judgement, but is more expensive, so it only re-scores the handful of candidates that already passed fusion, and it is opt-in.

The reranker is not an embedding model, and its keys do not live under embedding.*. The reranker model is memory.rerankerModel and its behaviour is tuned under rag.rerank.*. See Search and Memory.

Choosing a provider

Comis supports three embedding provider options: auto (default) — Tries local first, then falls back to remote. This is the best option for most users because it works without any setup. Comis automatically downloads a local model and uses it. If the local model fails to load, it falls back to the OpenAI API. local — Runs entirely on your machine using a downloaded model. This option is free, private (no data leaves your machine), and works offline. It requires approximately 500MB of disk space for the model and uses some RAM while running. GPU acceleration is used automatically when available. openai — Uses OpenAI’s embedding API. This option is fast and accurate but costs a small amount per search (fractions of a cent) and requires an OpenAI API key. Choose this if you want the highest quality embeddings or if your machine does not have enough resources for the local model.

The local embedding model downloads automatically on first use. No manual setup is needed — just start Comis and it handles everything.

Configuration

Option	Type	Default	What it does
`embedding.enabled`	boolean	`true`	Enable embedding generation for meaning-based search
`embedding.provider`	string	`"auto"`	Provider: `auto`, `local`, or `openai`
`embedding.local.gpu`	string	`"auto"`	GPU acceleration: `auto`, `metal`, `cuda`, `vulkan`, or `false`
`embedding.local.contextSize`	number	`2048`	Maximum text length the local model can process (in tokens)
`embedding.openai.model`	string	`"text-embedding-3-small"`	OpenAI embedding model to use
`embedding.openai.dimensions`	number	`1536`	Vector dimensions for the OpenAI model
`embedding.cache.maxEntries`	number	`10000`	Maximum cached embeddings in L1 in-memory cache (0 = disabled)
`embedding.cache.persistent`	boolean	`false`	Enable persistent L2 SQLite cache for embeddings that survive restarts
`embedding.cache.persistentMaxEntries`	number	`50000`	Maximum entries in L2 persistent cache
`embedding.cache.ttlMs`	number	(none)	TTL in milliseconds for cache entries. When unset, LRU eviction only
`embedding.cache.pruneIntervalMs`	number	`300000`	How often to check for expired or excess cache entries (5 min default)
`embedding.batch.batchSize`	number	`100`	Number of memories processed per batch during indexing
`embedding.batch.indexOnStartup`	boolean	`true`	Index any un-embedded memories when Comis starts
`embedding.autoReindex`	boolean	`true`	Automatically re-index all memories when the provider changes

Local provider example

~/.comis/config.yaml

embedding:
  enabled: true
  provider: "local"
  local:
    gpu: "auto"
    contextSize: 2048
  cache:
    maxEntries: 10000

OpenAI provider example

~/.comis/config.yaml

embedding:
  enabled: true
  provider: "openai"
  openai:
    model: "text-embedding-3-small"
    dimensions: 1536
  cache:
    maxEntries: 10000

Persistent caching

Comis uses a two-tier embedding cache to avoid redundant API calls:

L1 (in-memory LRU) — Fast access for recent embeddings. Controlled by embedding.cache.maxEntries. Lost on daemon restart.
L2 (SQLite-backed persistent) — Durable storage that survives daemon restarts. Embeddings computed once are reused across sessions indefinitely.

When a text needs to be embedded, Comis checks L1 first, then L2, and only calls the provider on a double miss. Cache misses at L1 that hit at L2 are promoted back to L1 for faster subsequent access.

~/.comis/config.yaml

embedding:
  cache:
    maxEntries: 10000
    persistent: true
    persistentMaxEntries: 50000
    pruneIntervalMs: 300000

Enable persistent caching if your agents frequently discuss the same topics. The L2 cache eliminates redundant embedding API calls after restarts, saving both time and money.

If you change your embedding provider (for example, switching from local to OpenAI), Comis automatically re-indexes all existing memories to use the new model. This happens in the background and does not interrupt normal operation. You can disable this with autoReindex: false if you prefer to manage re-indexing manually.

Re-embedding strategy

Switching providers (or models) means existing embeddings no longer match new ones — they live in a different vector space. Comis tracks the embedding fingerprint (provider + model + dimensions) and detects mismatches at startup. When embedding.autoReindex: true (the default), a background batch indexer re-embeds every existing memory in batches of embedding.batch.batchSize (default 100) until the database is consistent again. The agent stays responsive throughout — re-indexing runs out of band, and search keeps using whatever embeddings are valid at any given moment. The fingerprint manager (packages/memory/src/embedding-fingerprint.ts) and batch indexer (packages/memory/src/embedding-batch-indexer.ts) handle this automatically.

Cost note

OpenAI’s text-embedding-3-small is cheap: roughly $0.02 per 1M tokens of input. A typical memory entry is 50–200 tokens, so embedding 10,000 memories costs around $0.02–$0.04 total — and once cached (see “Persistent caching” above), re-indexing is free. The local provider has zero per-call cost but uses ~500 MB of disk and some RAM while running.

Disabling embeddings

If you do not want to use meaning-based search at all, set embedding.enabled to false. Memory search will still work using text matching only. This is useful if you want to minimize resource usage or if your agent’s memory is small enough that text matching is sufficient.

Search

How text matching and meaning matching work together.

Memory

The different types of memories your agent stores.

​What is wired today

​Why embeddings matter

​Embedder vs reranker

​Choosing a provider

​Configuration

​Local provider example

​OpenAI provider example

​Persistent caching

​Re-embedding strategy

​Cost note

​Disabling embeddings

Search

Memory

What is wired today

Why embeddings matter

Embedder vs reranker

Choosing a provider

Configuration

Local provider example

OpenAI provider example

Persistent caching

Re-embedding strategy

Cost note

Disabling embeddings