Navigation
The Whisper API supports real-time, low-latency transcription via WebSockets. Stream audio from microphones, files, or any source and receive transcription results as they’re processed.
Connection
Endpoint
WS /v1/listen
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
token | string | — | Required. Your API key |
model | string | tiny.en | Model to use for transcription |
language | string | en | BCP-47 language code |
encoding | string | linear16 | Audio encoding format |
sample_rate | integer | 16000 | PCM sample rate in Hz |
Supported Encodings
| Encoding | Description |
|---|---|
linear16 / pcm16 | Raw 16-bit PCM (default, lowest latency) |
wav | WAV container |
webm | WebM container (browser MediaRecorder) |
ogg / opus | OGG/Opus container |
mp3 | MP3 compressed |
flac | FLAC lossless |
mp4 / m4a | MP4/M4A container |
auto | Server auto-detects the format |
Connecting
wscat
wscat -c "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en&language=en" Python
import asyncio
import websockets
async def stream():
uri = "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en"
async with websockets.connect(uri) as ws:
# Read and send audio chunks
with open("audio.wav", "rb") as f:
while chunk := f.read(8000):
await ws.send(chunk)
await asyncio.sleep(0.25)
# Signal end of stream
await ws.send('{"type": "CloseStream"}')
# Receive results
async for message in ws:
print(message)
asyncio.run(stream()) JavaScript
const ws = new WebSocket(
'ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en'
);
ws.onopen = () => {
console.log('Connected');
// Send audio chunks via ws.send(audioBuffer)
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
if (result.type === 'Results') {
console.log(result.channel.alternatives[0].transcript);
}
};
ws.onclose = () => console.log('Disconnected'); Sending Audio Data
PCM Format Specifications
For encoding=linear16 (default, best latency):
| Property | Value |
|---|---|
| Sample Rate | 16,000 Hz |
| Bit Depth | 16-bit |
| Channels | 1 (Mono) |
| Endianness | Little-Endian |
| Recommended Chunk Size | 8,000 bytes (~250ms) |
Compressed Formats
For encoding=webm, ogg, mp3, etc., send raw container bytes. The server handles decoding internally using FFmpeg.
Server Responses
1. Initial Metadata
On connection, the server immediately sends a metadata message:
{
"type": "Metadata",
"request_id": "c4937a39-3482-414b-be42-2750043044f2",
"model_info": {
"tiny.en": {
"name": "whisper-tiny.en",
"version": "ggml-v1",
"arch": "whisper"
}
},
"channels": 1,
"created": "2026-03-30T00:03:18.621907Z"
}
2. Transcription Results
As audio is buffered and processed, the server streams JSON results:
{
"type": "Results",
"channel_index": [0, 1],
"duration": 2.05,
"start": 0.0,
"is_final": true,
"speech_final": false,
"channel": {
"alternatives": [
{
"transcript": "Hello world",
"confidence": 0.98,
"words": [
{ "word": "hello", "start": 0.0, "end": 0.5, "confidence": 0.97 },
{ "word": "world", "start": 0.5, "end": 1.0, "confidence": 0.99 }
]
}
],
"detected_language": "en"
},
"metadata": {
"request_id": "c4937a39-3482-414b-be42-2750043044f2",
"model_info": {
"tiny.en": {
"name": "whisper-tiny.en",
"version": "ggml-v1",
"arch": "whisper"
}
}
},
"from_finalize": false
}
Control Messages
Clients can send JSON-formatted control messages at any time:
Keep Alive
Prevents the connection from timing out during silence:
{ "type": "KeepAlive" }
Close Stream
Signals the server to process any remaining buffered audio and gracefully close the session:
{ "type": "CloseStream" }
Processing Pipeline
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Client │────▶│ Buffer │────▶│ Decode │────▶│ Transcribe │
│ (Audio Src) │ │ (Accumulate │ │ (FFmpeg if │ │ (whisper- │
│ │ │ chunks) │ │ compressed) │ │ cli) │
└──────────────┘ └──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ │
│ Client │◀─────────────────────────────────┘
│ (JSON Resp) │ Results (Deepgram format)
└──────────────┘
The buffer accumulates audio for the configured STREAM_CHUNK_DURATION_MS (default: 5000ms) before processing each segment.
Example Scripts
File-Based Streaming Test
Stream a pre-recorded file to simulate real-time input:
python examples/test_streaming.py \
--token YOUR_API_KEY \
--audio audio/jfk.wav \
--model tiny.en
Live Microphone Transcription
Stream audio directly from your microphone:
python examples/mic_transcription.py \
--token YOUR_API_KEY \
--model tiny.en \
--device 3
List Audio Devices
Find the correct microphone device index:
python examples/mic_transcription.py --list-devices
Troubleshooting
| Issue | Solution |
|---|---|
| Empty transcriptions | Ensure audio is 16kHz mono PCM. Check encoding parameter matches actual format. |
| Connection rejected | Verify your API token is valid and passed as ?token=... query parameter. |
| High latency | Use linear16 encoding and smaller STREAM_CHUNK_DURATION_MS (e.g., 2000). |
| Garbled output | Check sample rate matches. Use encoding=auto for non-PCM formats. |