VibeVoice-ASR GPTQ INT4
This repository contains a 4-bit GPTQ quantized export of microsoft/VibeVoice-ASR.
Quantization
- Method: GPTQ
- Bits: 4
- Group size: 128
- Logical parameter count: 8,674,021,857
Repository layout
This model is stored in a split VibeVoice layout:
- root directory: VibeVoice audio and non-decoder weights
decoder-gptq/: quantized Qwen2 decoder weights
Keep this layout intact when downloading or mirroring the repository.
Metadata
The root config.json includes:
vibevoice_metadatavibevoice_decoder_model_pathvibevoice_decoder_quantization
These fields identify the split decoder path and preserve the logical source-model metadata.
Validation
This GPTQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.
- outputs remained valid JSON transcript arrays
- output similarity to the full model remained high on tested samples
Serving note for vLLM 0.17.x
On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the Marlin GPTQ path.
- prefer letting vLLM infer the backend from
config.json - if you must set it explicitly, use
gptq_marlinrather than plaingptq
Note: in current split-VibeVoice testing, this GPTQ export did not show the same VRAM reduction as the AWQ export under vLLM 0.17.x, even when served through the Marlin path. The checkpoint is still published for reproducibility, but AWQ is currently the recommended low-VRAM variant.
Upstream references
- Code: https://github.com/microsoft/VibeVoice
- Base model: https://huggingface.co/microsoft/VibeVoice-ASR
- Report: https://arxiv.org/pdf/2601.18184
Notes
- This is a quantized derivative export, not the original upstream checkpoint.
- Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
- Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under
patches/vllm_0_17/.
- Downloads last month
- 114
Model tree for lemuriandezapada/VibeVoice-ASR-gptq-int4
Base model
microsoft/VibeVoice-ASR