Live Streaming (WebSocket) | Whisper API Docs

Navigation

The Whisper API supports real-time, low-latency transcription via WebSockets. Stream audio from microphones, files, or any source and receive transcription results as they’re processed.

Connection

Endpoint

WS /v1/listen

Query Parameters

Parameter	Type	Default	Description
`token`	`string`	—	Required. Your API key
`model`	`string`	`tiny.en`	Model to use — must be a `model_id` from `GET /v1/models` (unknown values close the socket)
`language`	`string`	`en`	BCP-47 language code
`encoding`	`string`	`linear16`	Audio encoding format
`sample_rate`	`integer`	`16000`	PCM sample rate in Hz

Supported Encodings

Encoding	Description
`linear16` / `pcm16`	Raw 16-bit PCM (default, lowest latency)
`wav`	WAV container
`webm`	WebM container (browser MediaRecorder)
`ogg` / `opus`	OGG/Opus container
`mp3`	MP3 compressed
`flac`	FLAC lossless
`mp4` / `m4a`	MP4/M4A container
`auto`	Server auto-detects the format

Connecting

wscat

wscat -c "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en&language=en"

Python

import asyncio
import websockets

async def stream():
    uri = "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en"
    async with websockets.connect(uri) as ws:
        # Read and send audio chunks
        with open("audio.wav", "rb") as f:
            while chunk := f.read(8000):
                await ws.send(chunk)
                await asyncio.sleep(0.25)

        # Signal end of stream
        await ws.send('{"type": "CloseStream"}')

        # Receive results
        async for message in ws:
            print(message)

asyncio.run(stream())

JavaScript

const ws = new WebSocket(
  'ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en'
);

ws.onopen = () => {
  console.log('Connected');
  // Send audio chunks via ws.send(audioBuffer)
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  if (result.type === 'Results') {
    console.log(result.channel.alternatives[0].transcript);
  }
};

ws.onclose = () => console.log('Disconnected');

Sending Audio Data

PCM Format Specifications

For encoding=linear16 (default, best latency):

Property	Value
Sample Rate	16,000 Hz
Bit Depth	16-bit
Channels	1 (Mono)
Endianness	Little-Endian
Recommended Chunk Size	8,000 bytes (~250ms)

Compressed Formats

For encoding=webm, ogg, mp3, etc., send raw container bytes. The server handles decoding internally using FFmpeg.

Server Responses

1. Initial Metadata

On connection, the server immediately sends a metadata message:

{
  "type": "Metadata",
  "request_id": "c4937a39-3482-414b-be42-2750043044f2",
  "model_info": {
    "tiny.en": {
      "name": "whisper-tiny.en",
      "version": "ggml-v1",
      "arch": "whisper"
    }
  },
  "channels": 1,
  "created": "2026-03-30T00:03:18.621907Z"
}

2. Transcription Results

As audio is buffered and processed, the server streams JSON results:

{
  "type": "Results",
  "channel_index": [0, 1],
  "duration": 2.05,
  "start": 0.0,
  "is_final": true,
  "speech_final": false,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world",
        "confidence": 0.98,
        "words": [
          { "word": "hello", "start": 0.0, "end": 0.5, "confidence": 0.97 },
          { "word": "world", "start": 0.5, "end": 1.0, "confidence": 0.99 }
        ]
      }
    ],
    "detected_language": "en"
  },
  "metadata": {
    "request_id": "c4937a39-3482-414b-be42-2750043044f2",
    "model_info": {
      "tiny.en": {
        "name": "whisper-tiny.en",
        "version": "ggml-v1",
        "arch": "whisper"
      }
    }
  },
  "from_finalize": false
}

Control Messages

Clients can send JSON-formatted control messages at any time:

Keep Alive

Prevents the connection from timing out during silence:

{ "type": "KeepAlive" }

Close Stream

Signals the server to process any remaining buffered audio and gracefully close the session:

{ "type": "CloseStream" }

Processing Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Client     │────▶│   Buffer     │────▶│   Decode     │────▶│  Transcribe  │
│  (Audio Src) │     │  (Accumulate │     │  (FFmpeg if   │     │  (whisper-   │
│              │     │   chunks)    │     │   compressed) │     │   cli)       │
└──────────────┘     └──────────────┘     └──────────────┘     └──────┬───────┘
                                                                       │
                     ┌──────────────┐                                  │
                     │   Client     │◀─────────────────────────────────┘
                     │  (JSON Resp) │         Results (Deepgram format)
                     └──────────────┘

The buffer accumulates audio for the configured STREAM_CHUNK_DURATION_MS (default 2000 ms in config.py) before processing each PCM segment; effective byte threshold scales with sample_rate.

Example Scripts

File-Based Streaming Test

Stream a pre-recorded file to simulate real-time input:

python examples/test_streaming.py \
  --token YOUR_API_KEY \
  --audio audio/jfk.wav \
  --model tiny.en

Live Microphone Transcription

Stream audio directly from your microphone:

python examples/mic_transcription.py \
  --token YOUR_API_KEY \
  --model tiny.en \
  --device 3

List Audio Devices

Find the correct microphone device index:

python examples/mic_transcription.py --list-devices

Troubleshooting

Issue	Solution
Empty transcriptions	Ensure audio is 16kHz mono PCM. Check `encoding` parameter matches actual format.
Connection rejected	Verify your API token: query `?token=...`, `Authorization: Token ...`, or subprotocol pairs (see Authentication). Unknown `model` closes with code `4002`.
High latency	Use `linear16` encoding and smaller `STREAM_CHUNK_DURATION_MS` (e.g., 2000).
Garbled output	Check sample rate matches. Use `encoding=auto` for non-PCM formats.

← Previous

REST API Reference

Models