Navigation
What is Whisper API?
Whisper API wraps the blazing-fast whisper.cpp engine with a clean, Deepgram-compatible REST and WebSocket interface. Deploy it on your own infrastructure — your audio never leaves your servers.
It handles everything from simple file uploads to real-time microphone streaming, with built-in subtitle export and speaker diarization.
Core Capabilities
Deepgram-Compatible API
Drop-in replacement for /v1/listen endpoints. Migrate existing Deepgram integrations with minimal code changes.
Real-Time Streaming
WebSocket endpoint for live transcription. Supports PCM, WebM, OGG, FLAC, and auto-detection.
Local CPU Transcription
Run transcription directly on your own machine CPU with no external cloud dependency.
Offline Key Management
Generate and revoke API keys via CLI. No external auth services required.
Architecture
The server sits between your clients and the whisper.cpp binary, handling audio conversion, authentication, and response formatting.
Client (cURL / Python / JS / Deepgram SDK)
│
▼ HTTP POST or WebSocket
FastAPI Server (:7860)
├── /v1/listen POST → Transcribe uploaded file or URL
├── /v1/listen WS → Real-time streaming transcription
├── /v1/models GET → List available GGML models
└── /v1/auth POST → Test token generation (dev only)
│
▼ subprocess
whisper-cli (compiled from whisper.cpp)
│
▼ reads
models/ggml-*.bin (GGML quantized model files)
Quick Links
Installation
Clone, install dependencies, build the binary, and start the server.
REST Reference
Parameters, headers, response schemas, and error codes for /v1/listen.
WebSocket Streaming
Connection protocol, audio formats, control messages, and example scripts.
Docker
Production deployment with Docker and Docker Compose.
Stack
| Layer | Technology |
|---|---|
| Runtime | Python 3.10+ / FastAPI |
| Transcription | whisper.cpp (C++ binary via subprocess) |
| Models | GGML quantized .bin files |
| Database | SQLite via SQLAlchemy |
| Streaming | Native FastAPI WebSocket |
| Container | Docker (Debian Bookworm) |
| License | MIT |