Skip to main content
What it does. Finds the most relevant memories for a question your agent is about to answer, by combining keyword search with meaning-based search. Who it is for. Mostly automatic — every agent reply uses it. You only need to read this if you are tuning recall quality or calling search yourself via RPC. When your agent needs to recall something, Comis searches through its memory using two complementary approaches: text matching and meaning matching. Together, these give your agent the best chance of finding relevant information.

How search works

Comis uses two search methods and combines their results: Text matching — Comis looks for memories that contain the same words as the search query. This is fast and precise for exact topics. If you ask about “project deadlines,” text matching finds memories that literally contain the words “project” and “deadlines.” Meaning matching — Comis compares the meaning of your query against stored memories, finding related information even if the exact words are different. If you ask about “project deadlines,” meaning matching might also find a memory about “task due dates” because the meaning is similar. This uses embeddings to understand meaning. Combined results — The lanes are merged into a single ranked list, duplicates are removed, and the most relevant results are ranked to the top. This combination means your agent benefits from both precise word matching and flexible meaning matching. When the optional entity lane is enabled, its associative hits join the same merge as a third lane.

When search is used

Memory search happens automatically in two situations:
  1. During memory recall (RAG) — Before every agent response, Comis searches for relevant memories and includes them in the agent’s context. See Memory Recall (RAG) for how this works.
  2. Through memory tools — Your agent can explicitly search its memory as part of its reasoning process. For example, if you ask “what did we decide about the logo?” the agent might search its memory for previous discussions about logos.

Search quality

The quality of search results depends on which search methods are available:
  • Both methods active (recommended) — The best results come from combining text matching and meaning matching. This is the default when embeddings are configured.
  • Text matching only — If embeddings are not available, search still works but only finds memories with matching words. A search for “pasta recipe” would not find a memory about “how to cook spaghetti” because the words are different even though the meaning is the same.
For the best search experience, make sure embeddings are set up. The default auto provider handles this automatically.
If the meaning-matching system is not available (embeddings not configured), Comis falls back to text matching only. Search still works — it is just less flexible at finding related information when different words are used.

Relevance scoring

Each search result gets a relevance score between 0 and 1. Higher scores mean a closer match to your query. The RAG configuration lets you set a minimum score threshold (minScore, default 0.1) so only sufficiently relevant memories are included. You generally do not need to adjust the scoring — the defaults work well for most use cases. If your agent is recalling too many irrelevant memories, increase minScore. If it is missing important context, lower it.

Reference: how the lanes are merged

The hybrid search lives in packages/memory/src/hybrid-search.ts. Several retrieval lanes combine into the final ranking:
  1. FTS5 BM25 — The full-text index (memory_fts) returns a BM25 rank per match (lower is better). Stop words are filtered for Latin scripts; Unicode-aware tokenisation handles CJK, Cyrillic, and Arabic without stripping content.
  2. sqlite-vec cosine KNN — When embeddings exist for stored memories, a vector KNN query against vec_memories returns a similarity-ranked list.
  3. Entity lane (opt-in) — When rag.entityLane.enabled is on, a one-hop self-join over shared linked entities contributes a third ranked list. See Memory.
  4. N-lane Reciprocal Rank Fusion (RRF) — All active lanes are merged using the classic RRF formula score = 1 / (k + rank) with k = 60. RRF is robust against the lanes’ score scales being incompatible, and adding or dropping a lane (e.g. the entity lane, or vector when no embeddings exist) just changes how many lists feed the same formula.
If sqlite-vec is not available or the query has no embedding, search falls back gracefully to FTS5 only.

Cross-encoder reranking (opt-in)

After fusion, recall can optionally rerank the top fused candidates with a cross-encoder before scoring. This stage is opt-in (rag.rerank.enabled defaults to false), so by default the fused RRF order is the recall order. When rag.rerank.enabled: true:
  • The top fused candidates (cap rag.rerank.maxCandidates, default 40) are re-scored by an on-device cross-encoder (bge-reranker-v2-m3 Q8_0, run through node-llama-cpp — the same runtime used for local embeddings).
  • A cross-encoder reads the query and a candidate together and emits a direct relevance score, so after reranking the cross-encoder score becomes the primary order and the fused RRF order is discarded.
  • The stage is bounded by rag.rerank.timeoutMs (default 800); if it times out or the reranker is unavailable, recall falls back to the fused order.
A cross-encoder is a different model from the embedder used for the vector lane (a bi-encoder). See Embeddings for the distinction, and Memory for the full configuration and graceful-degradation behaviour.

Filters supported

Search results can be filtered by:
  • tenantId — multi-tenant isolation (required, set by the caller).
  • agentId — restrict to memories owned by one agent.
  • trustLevelsystem, learned, or external (see Memory).
  • memoryTypeworking, episodic, semantic, or procedural.
  • expiresAt — already-expired entries are silently excluded.

Calling search from outside the agent

The web dashboard and external integrations call search through the JSON-RPC endpoint. The handler is registered as memory.search:
curl -X POST http://localhost:7777/rpc \
  -H "Authorization: Bearer $COMIS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "memory.search",
    "params": {
      "query": "what did we decide about the logo?",
      "limit": 5,
      "minScore": 0.1,
      "trustLevels": ["system", "learned"]
    },
    "id": 1
  }'
The response contains the ranked results with their RRF score, source (fts, vec, or both), and the underlying memory record. The memory.browse RPC offers a non-search paginated view of memories when you just want to list them by date.

Configuration

Search does not have its own configuration section. Search behavior is controlled through:
  • Memory — Where memories are stored and how they are organized.
  • RAG — How many results to include, minimum relevance score, and trust level filtering.
  • Embeddings — Whether meaning matching is available and which provider powers it.

Memory

The different types of memories your agent stores.

Embeddings

Setting up meaning-based search for better results.

Memory Recall (RAG)

How recalled memories are used in agent responses.