Skip to main content
The Media section provides two views: a test playground for verifying media processing capabilities, and a configuration panel for reviewing and managing media provider settings. Together, they let you confirm that vision, speech, documents, and other media features are working correctly before relying on them in production.

Routes

  • /media — opens the Media Test playground
  • /media/config — opens the Media Configuration panel

Backing RPC

  • media.providers — which providers are configured for each capability and whether the credentials work
  • media.transcribe, media.synthesize, media.analyzeImage, media.describeVideo, media.extractDocument, media.fetchLink — one RPC per capability
  • POST /media/upload (HTTP, not RPC) — multipart upload for audio/image/document files
  • config.patch(section: "media", ...) — save provider/model changes via the Edit in Config Editor links
Test calls are billed against your provider quota — they hit the real API and produce real costs.

What You See

The Media section is split into two views accessible from the sidebar under a single “Media” item.

Media Test Playground

The test view has six tabs, one for each media processing capability:
TabWhat It Tests
STTSpeech-to-Text — upload an audio file (up to 25 MB) and get a transcription with provider info
TTSText-to-Speech — enter text, synthesize speech, and play the audio in your browser
VisionImage analysis — upload an image (up to 20 MB) and get a description or analysis from the vision provider
DocumentDocument extraction — upload a PDF, CSV, or other document (up to 50 MB) and get the extracted text content
VideoVideo analysis — upload a video file (up to 50 MB) and get a description or analysis
LinkLink enrichment — enter a URL and get the extracted content, title, and metadata

How Testing Works

Each tab follows the same pattern:
  1. Select or upload the input (file upload or text/URL input)
  2. Click the test button
  3. View the result, which includes the processing output and provider information
The view checks media provider availability on load and shows which providers are configured and ready. If a provider is not available, the corresponding tab will indicate that the capability is not configured.

Provider Info

At the top of the test view, a summary shows which media providers are currently configured. This is fetched from the media.providers RPC endpoint and shows the active provider for each capability (e.g., OpenAI for STT, ElevenLabs for TTS).

Media Configuration

The configuration panel (at /media/config) shows provider status and current settings for all five media subsystems:
SubsystemWhat It Configures
STTSpeech-to-Text provider (OpenAI, Groq, Deepgram), model selection, language
TTSText-to-Speech provider (OpenAI, ElevenLabs, Edge TTS), voice selection, speed
VisionVision analysis provider, model, max tokens for descriptions
Document ExtractionPDF and document processing settings
Link UnderstandingWeb content extraction and readability settings
Each subsystem is displayed as a card with:
  • Status indicator — whether the provider is configured and operational
  • Current settings — the active provider, model, and key configuration values
  • Edit link — a “Edit in Config Editor” link that navigates to the relevant config section for detailed editing

Common Tasks

1

Verify STT is working

Go to /media, select the STT tab, upload an audio file, and click Test. The transcription should appear with the provider name.
2

Test TTS synthesis

Go to /media, select the TTS tab, enter some text, and click Synthesize. An audio player should appear letting you play the generated speech.
3

Check media provider configuration

Go to /media/config to see all five media subsystems with their status and settings. Use the “Edit in Config Editor” links to change providers.
4

Test vision analysis

Go to /media, select the Vision tab, upload an image, and click Analyze. The analysis result shows what the vision model detected in the image.

Media & Voice Overview

Full documentation for all media capabilities.

Vision

Detailed vision processing documentation.

Voice

Speech-to-text and text-to-speech documentation.

Config Editor

Edit media provider configuration in the YAML editor.