Instructions to use smkrv/whisper-podlodka-turbo-coreml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- WhisperKit
How to use smkrv/whisper-podlodka-turbo-coreml with WhisperKit:
# Install CLI with Homebrew on macOS device brew install whisperkit-cli # View all available inference options whisperkit-cli transcribe --help # Download and run inference using whisper base model whisperkit-cli transcribe --audio-path /path/to/audio.mp3 # Or use your preferred model variant whisperkit-cli transcribe --model "large-v3" --model-prefix "distil" --audio-path /path/to/audio.mp3 --verbose
- Notebooks
- Google Colab
- Kaggle
Whisper-Podlodka-Turbo - CoreML / WhisperKit
A CoreML conversion of bond005/whisper-podlodka-turbo packaged for the WhisperKit runtime. Runs end-to-end on the Apple Neural Engine (ANE) on Apple Silicon Macs, iPhone, and iPad.
The upstream model is Ivan Bondarenko's Russian-focused fine-tune of openai/whisper-large-v3-turbo, with improved noise robustness and reduced non-speech hallucinations. This repository contains only the converted weights - no architectural or training changes.
Files
| File | Size | Purpose |
|---|---|---|
MelSpectrogram.mlmodelc |
~400 KB | Audio preprocessing (log-mel filterbank) |
AudioEncoder.mlmodelc |
~1.2 GB | 32-layer encoder, FP16 |
TextDecoder.mlmodelc |
~330 MB | 4-layer turbo decoder, FP16 |
config.json |
- | Hugging Face Whisper config, inherited from the base model |
generation_config.json |
- | Generation defaults, inherited from the base model |
All three .mlmodelc directories are pre-compiled MLProgram assets ready for direct use by WhisperKit. No additional compile step is required.
Architecture
Inherited from the base model (Whisper Large v3 Turbo):
| Hyperparameter | Value |
|---|---|
| Encoder layers | 32 |
| Decoder layers | 4 |
| Hidden size (d_model) | 1280 |
| Attention heads (enc / dec) | 20 / 20 |
| Mel bins | 128 |
| Vocabulary | 51866 |
| Max source positions | 1500 (30 s @ 16 kHz) |
| Max target positions | 448 |
Conversion
Converted with argmaxinc/whisperkittools 0.4.2:
whisperkit-generate-model \
--model-version bond005/whisper-podlodka-turbo \
--output-dir ./out \
--generate-decoder-context-prefill-data
- Conversion environment: Python 3.12,
torch==2.5.0,coremltools==9.0,transformers==4.53 - Compute precision: FP16 across all three components
- Decoder SDPA implementation:
Cat(default) - Audio encoder SDPA implementation:
SplitHeadsQ(default) - Decoder context prefill data: enabled (pre-computes the KV cache for the first 3 forced tokens to reduce time-to-first-token)
- Minimum deployment target: macOS 14 / iOS 17
Usage with WhisperKit (Swift)
import WhisperKit
let folder = URL(fileURLWithPath: "/path/to/whisper-podlodka-turbo-coreml")
let pipe = try await WhisperKit(modelFolder: folder.path)
let result = try await pipe.transcribe(audioPath: "/path/to/audio.wav")
print(result?.text ?? "")
The tokenizer is the standard Whisper Large v3 tokenizer (vocab 51866) - WhisperKit will fetch it from openai/whisper-large-v3 if it is not present locally. Russian is the recommended decoding language for this model.
Performance
End-to-end ANE execution on an M-series Mac yields realtime factors significantly above 1.0x. First inference compiles ANE-specific kernels and may take noticeably longer; subsequent inferences use the cached compilation and are fast.
Languages
Primary: Russian. Secondary: English. The base fine-tune preserves the multilingual capability of Whisper Large v3 Turbo but is optimized for Russian ASR and Russian/English speech translation.
Limitations
- WhisperKit's pipeline currently uses the same scaffolding as standard Whisper Large v3 Turbo. Any quality differences between this fine-tune and the base Turbo model are inherited from the upstream weights.
- Translation behavior is inherited from the upstream fine-tune. Refer to the base model card for translation quality notes.
- For evaluation numbers (WER on Common Voice, RuLibriSpeech, Golos, SOVA RuDevices, Podlodka Speech, plus noise-robust and long-form benchmarks) see the upstream model card.
License
Apache 2.0, inherited from the base model.
Credits
- Base fine-tune: Ivan Bondarenko,
bond005/whisper-podlodka-turbo - Foundation model: OpenAI,
whisper-large-v3-turbo - CoreML conversion toolkit: Argmax, Inc. - whisperkittools
- Runtime: WhisperKit
Citation
For the base fine-tune, cite the upstream model:
@misc{whisper-podlodka-turbo,
author = {Ivan Bondarenko},
title = {Whisper-Podlodka-Turbo: Enhanced Whisper Model for Russian ASR},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/bond005/whisper-podlodka-turbo}}
}
- Downloads last month
- 1,767
Model tree for smkrv/whisper-podlodka-turbo-coreml
Base model
openai/whisper-large-v3