Models | Whisper API Docs

Navigation

The Whisper API uses GGML-format model files (.bin) compatible with whisper.cpp. Files live under MODELS_DIR (default models/). The server exposes a fixed registry in code (model_names in app/utils/constant.py): GET /v1/models lists registry entries whose matching .bin file exists on disk. On startup, missing registry models can be downloaded automatically (see app/utils/checks.py).

Available Model Sizes

Model	Parameters	Disk Size	Relative Speed	Best For
`tiny`	39M	~75 MB	Fastest	Quick prototyping, low-resource devices
`tiny.en`	39M	~75 MB	Fastest	English-only, highest speed
`base`	74M	~142 MB	Fast	Good balance for English
`base.en`	74M	~142 MB	Fast	English-only with better accuracy
`small`	244M	~466 MB	Medium	Multi-language, good accuracy
`small.en`	244M	~466 MB	Medium	English-only, recommended for production
`medium`	769M	~1.5 GB	Slow	High accuracy, multi-language
`medium.en`	769M	~1.5 GB	Slow	English-only, near-best accuracy
`large-v2`	1550M	~3.1 GB	Slowest	Maximum accuracy, all languages
`large-v3`	1550M	~3.1 GB	Slowest	Latest, best multilingual

Quantization Formats

GGML models support multiple quantization levels to trade accuracy for speed and memory:

Format	Bits	Size Reduction	Quality	Recommended
`f32`	32-bit	1x (baseline)	Highest	No (too large)
`f16`	16-bit	~0.5x	Very High	Research only
`q8_0`	8-bit	~0.25x	High	Yes, if memory allows
`q5_1`	5-bit	~0.16x	Good	Best overall
`q5_0`	5-bit	~0.16x	Good	Alternative to q5_1
`q4_1`	4-bit	~0.125x	Fair	Max speed priority
`q4_0`	4-bit	~0.125x	Fair	Smallest size

Downloading Models

From Hugging Face

Official GGML models are available on the whisper.cpp Hugging Face repository:

# Download tiny.en (q5_1 quantized)
curl -L -o models/ggml-tiny.en-q5_1.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en-q5_1.bin"

# Download small.en (q5_1 quantized)
curl -L -o models/ggml-small.en-q5_1.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en-q5_1.bin"

# Download base.en (full precision)
curl -L -o models/ggml-base.en.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"

From the Setup Script

The setup_whisper.sh script automatically downloads the tiny.en model if none exists:

./setup_whisper.sh

Adding a New Model

Download the .bin file from Hugging Face or another source
Move it to the models/ directory:
```
mv ggml-small.en-q5_1.bin models/
```
Register the model in app/utils/constant.py (model_names, and optionally model_info / model_urls) so the API knows the model query id and filename.

Restart the API server:

# Stop the running server (Ctrl+C)
uvicorn app.main:app --host 0.0.0.0 --port 7860

Verify the model appears (no auth required):

curl -X GET 'http://localhost:7860/v1/models'

Model File Naming

The API uses the model filename (without ggml- prefix and .bin suffix) as the model identifier in API requests:

File Name	API `model` Parameter
`ggml-tiny.en.bin`	`tiny.en`
`ggml-model-whisper-small.en-q5_1.bin`	`model-whisper-small.en-q5_1`
`ggml-base.en.bin`	`base.en`

Model Directory Configuration

The model directory is configured via the MODELS_DIR environment variable:

# .env
MODELS_DIR=./models

Only registered models (see model_names) with a present file under MODELS_DIR are listed and accepted by /v1/listen.

← Previous

Live Streaming (WebSocket)

Docker Deployment