Routes
/media— opens the Media Test playground/media/config— opens the Media Configuration panel
Backing RPC
media.providers— which providers are configured for each capability and whether the credentials workmedia.transcribe,media.synthesize,media.analyzeImage,media.describeVideo,media.extractDocument,media.fetchLink— one RPC per capabilityPOST /media/upload(HTTP, not RPC) — multipart upload for audio/image/document filesconfig.patch(section: "media", ...)— save provider/model changes via the Edit in Config Editor links
What You See
The Media section is split into two views accessible from the sidebar under a single “Media” item.Media Test Playground
The test view has six tabs, one for each media processing capability:| Tab | What It Tests |
|---|---|
| STT | Speech-to-Text — upload an audio file (up to 25 MB) and get a transcription with provider info |
| TTS | Text-to-Speech — enter text, synthesize speech, and play the audio in your browser |
| Vision | Image analysis — upload an image (up to 20 MB) and get a description or analysis from the vision provider |
| Document | Document extraction — upload a PDF, CSV, or other document (up to 50 MB) and get the extracted text content |
| Video | Video analysis — upload a video file (up to 50 MB) and get a description or analysis |
| Link | Link enrichment — enter a URL and get the extracted content, title, and metadata |
How Testing Works
Each tab follows the same pattern:- Select or upload the input (file upload or text/URL input)
- Click the test button
- View the result, which includes the processing output and provider information
Provider Info
At the top of the test view, a summary shows which media providers are currently configured. This is fetched from themedia.providers RPC endpoint and shows the active provider for each capability (e.g., OpenAI for STT, ElevenLabs for TTS).
Media Configuration
The configuration panel (at/media/config) shows provider status and current settings for all five media subsystems:
| Subsystem | What It Configures |
|---|---|
| STT | Speech-to-Text provider (OpenAI, Groq, Deepgram), model selection, language |
| TTS | Text-to-Speech provider (OpenAI, ElevenLabs, Edge TTS), voice selection, speed |
| Vision | Vision analysis provider, model, max tokens for descriptions |
| Document Extraction | PDF and document processing settings |
| Link Understanding | Web content extraction and readability settings |
- Status indicator — whether the provider is configured and operational
- Current settings — the active provider, model, and key configuration values
- Edit link — a “Edit in Config Editor” link that navigates to the relevant config section for detailed editing
Common Tasks
Verify STT is working
Go to
/media, select the STT tab, upload an audio file, and click Test. The transcription should appear with the provider name.Test TTS synthesis
Go to
/media, select the TTS tab, enter some text, and click Synthesize. An audio player should appear letting you play the generated speech.Check media provider configuration
Go to
/media/config to see all five media subsystems with their status and settings. Use the “Edit in Config Editor” links to change providers.Related Pages
Media & Voice Overview
Full documentation for all media capabilities.
Vision
Detailed vision processing documentation.
Voice
Speech-to-text and text-to-speech documentation.
Config Editor
Edit media provider configuration in the YAML editor.
