You don’t need to understand the technical details to use this feature. The configuration examples below are copy-paste ready.
packages/skills/src/integrations/document/):
- PDFs —
pdf-extractor.tsusing pdfjs, with optional vision OCR fallback for scanned pages. - CSV / spreadsheets —
file-extractor.tsparses each row inline. - Plain text and code —
text-decoder.tshandles UTF-8 / ASCII / other charsets. - Composite auto-detect —
composite-extractor.tspicks the right extractor based on MIME magic bytes plus the filename extension.
Supported File Types
Comis can extract text from all of the following file types out of the box. If someone sends one of these files to your agent, the content is available immediately.| File Type | Extension | MIME Type |
|---|---|---|
| Plain text | .txt | text/plain |
| CSV | .csv | text/csv |
| Markdown | .md | text/markdown |
| HTML | .html | text/html |
| XML | .xml | text/xml, application/xml |
| JSON | .json | application/json |
.pdf | application/pdf | |
| YAML | .yaml, .yml | text/yaml, application/x-yaml |
| JavaScript | .js | text/javascript |
| Python | .py | text/x-python |
| TypeScript | .ts | text/x-typescript |
| Shell scripts | .sh | application/x-sh |
The list above covers most common document types. If you need to support
additional MIME types, you can extend the list using the
allowedMimes
configuration option (see the Configuration section below).How It Works
Automatic Extraction
WhendocumentExtraction.enabled is true (the default), any supported
document attached to a message is automatically extracted before your agent
processes the message. The agent sees the document text inline, as if the user
had pasted the content directly into the chat. No tool call is needed — the
content is simply there when the agent starts processing.
For example, if someone sends a CSV file, your agent sees the raw CSV data
and can answer questions about it, summarize it, or extract specific values —
all without any extra steps. If someone sends a Python script, the agent sees
the source code and can review it, explain what it does, or suggest
improvements.
When multiple documents are attached to a single message, Comis extracts all
of them (up to the maxTotalChars limit). The agent sees the content of each
document labeled by filename and type.
On-Demand Extraction
Your agent can also use theextract_document tool to process documents
explicitly. This is useful when:
- The agent wants to extract a document that was shared earlier in the conversation
- Automatic extraction is disabled and the agent needs to read a file
- The agent wants to process a document from a URL rather than an attachment
- A document exceeds automatic extraction limits and the agent wants to try extracting a specific portion
extract_document tool
parameters and usage examples.
PDF Handling
PDFs have special handling because they come in two varieties: Text-based PDFs: These are normal PDFs where the text can be selected and copied. Comis extracts the text directly, numbering each page so your agent knows which content comes from where. Scanned PDFs (images of text): These are PDFs created from scanned paper documents or photos. The pages are images, so normal text extraction returns little or no text. By default, scanned PDFs produce empty or minimal results. To handle scanned PDFs, enable the vision OCR fallback. When enabled, Comis detects pages with very little text and sends them to your configured vision AI provider to read the text from the image.pdfImageFallbackThreshold controls when OCR kicks in. If a page has
fewer than this many characters after normal text extraction, Comis treats it
as a scanned page and uses vision AI to read it. The default threshold is 50
characters.
PDF OCR uses your configured vision provider, so make sure you have at least
one vision provider set up if you enable this feature. See
Vision for provider setup.
What Happens Without Document Extraction
If document extraction is disabled or the file type is not supported, your agent still knows a document was sent. It receives a hint like:
“Attached: report.pdf (application/pdf) — use extract_document tool to
read”
The agent can then use the extract_document on-demand tool if it is
available. This means documents are never silently ignored — your agent
always knows when a file was shared, even if automatic extraction is not
configured.
For unsupported file types (like .zip archives or binary executables), the
agent sees the filename and MIME type but cannot extract text content. It can
still acknowledge the file and let the user know that the file type is not
supported for text extraction.
Limits
You can control how much processing power documents consume using these settings:| Setting | Default | Description |
|---|---|---|
maxBytes | 10 MB | Maximum file size for extraction |
maxChars | 200,000 | Maximum characters in extracted text |
maxTotalChars | 500,000 | Maximum total characters across all attachments in a single message |
maxPages | 20 | Maximum pages for paginated documents like PDFs |
timeoutMs | 30,000 ms | Extraction timeout per document |
- If a PDF has 50 pages but
maxPagesis set to 20, only the first 20 pages are extracted and your agent is informed that the document was truncated. - If a document’s extracted text exceeds
maxChars, the text is cut off at the limit and the agent knows the content was truncated. - If a file exceeds
maxBytes, extraction is skipped entirely and the agent receives a hint about the file instead.
Configuration
Here is the complete configuration reference for document extraction. All settings live underintegrations.media.documentExtraction in your
config.yaml:
Related
Vision
Image analysis powers PDF OCR fallback
Media Tools
The
extract_document tool referenceMedia Overview
Back to media overview
Config Reference
Full configuration reference
