Whisper-Podlodka-Turbo - CoreML / WhisperKit

A CoreML conversion of bond005/whisper-podlodka-turbo packaged for the WhisperKit runtime. Runs end-to-end on the Apple Neural Engine (ANE) on Apple Silicon Macs, iPhone, and iPad.

The upstream model is Ivan Bondarenko's Russian-focused fine-tune of openai/whisper-large-v3-turbo, with improved noise robustness and reduced non-speech hallucinations. This repository contains only the converted weights - no architectural or training changes.

Files

File	Size	Purpose
`MelSpectrogram.mlmodelc`	~400 KB	Audio preprocessing (log-mel filterbank)
`AudioEncoder.mlmodelc`	~1.2 GB	32-layer encoder, FP16
`TextDecoder.mlmodelc`	~330 MB	4-layer turbo decoder, FP16
`config.json`	-	Hugging Face Whisper config, inherited from the base model
`generation_config.json`	-	Generation defaults, inherited from the base model

All three .mlmodelc directories are pre-compiled MLProgram assets ready for direct use by WhisperKit. No additional compile step is required.

Architecture

Inherited from the base model (Whisper Large v3 Turbo):

Hyperparameter	Value
Encoder layers	32
Decoder layers	4
Hidden size (d_model)	1280
Attention heads (enc / dec)	20 / 20
Mel bins	128
Vocabulary	51866
Max source positions	1500 (30 s @ 16 kHz)
Max target positions	448

Conversion

Converted with argmaxinc/whisperkittools 0.4.2:

whisperkit-generate-model \
  --model-version bond005/whisper-podlodka-turbo \
  --output-dir ./out \
  --generate-decoder-context-prefill-data

Conversion environment: Python 3.12, torch==2.5.0, coremltools==9.0, transformers==4.53
Compute precision: FP16 across all three components
Decoder SDPA implementation: Cat (default)
Audio encoder SDPA implementation: SplitHeadsQ (default)
Decoder context prefill data: enabled (pre-computes the KV cache for the first 3 forced tokens to reduce time-to-first-token)
Minimum deployment target: macOS 14 / iOS 17

Usage with WhisperKit (Swift)

import WhisperKit

let folder = URL(fileURLWithPath: "/path/to/whisper-podlodka-turbo-coreml")
let pipe = try await WhisperKit(modelFolder: folder.path)
let result = try await pipe.transcribe(audioPath: "/path/to/audio.wav")
print(result?.text ?? "")

The tokenizer is the standard Whisper Large v3 tokenizer (vocab 51866) - WhisperKit will fetch it from openai/whisper-large-v3 if it is not present locally. Russian is the recommended decoding language for this model.

Performance

End-to-end ANE execution on an M-series Mac yields realtime factors significantly above 1.0x. First inference compiles ANE-specific kernels and may take noticeably longer; subsequent inferences use the cached compilation and are fast.

Languages

Primary: Russian. Secondary: English. The base fine-tune preserves the multilingual capability of Whisper Large v3 Turbo but is optimized for Russian ASR and Russian/English speech translation.

Limitations

WhisperKit's pipeline currently uses the same scaffolding as standard Whisper Large v3 Turbo. Any quality differences between this fine-tune and the base Turbo model are inherited from the upstream weights.
Translation behavior is inherited from the upstream fine-tune. Refer to the base model card for translation quality notes.
For evaluation numbers (WER on Common Voice, RuLibriSpeech, Golos, SOVA RuDevices, Podlodka Speech, plus noise-robust and long-form benchmarks) see the upstream model card.

License

Apache 2.0, inherited from the base model.

Credits

Base fine-tune: Ivan Bondarenko, bond005/whisper-podlodka-turbo
Foundation model: OpenAI, whisper-large-v3-turbo
CoreML conversion toolkit: Argmax, Inc. - whisperkittools
Runtime: WhisperKit

Citation

For the base fine-tune, cite the upstream model:

@misc{whisper-podlodka-turbo,
  author = {Ivan Bondarenko},
  title = {Whisper-Podlodka-Turbo: Enhanced Whisper Model for Russian ASR},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/bond005/whisper-podlodka-turbo}}
}

Downloads last month: 1,767

Model tree for smkrv/whisper-podlodka-turbo-coreml

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

bond005/whisper-podlodka-turbo

Quantized

(4)

this model