VibeVoice-ASR GPTQ INT4

This repository contains a 4-bit GPTQ quantized export of microsoft/VibeVoice-ASR.

Quantization

Method: GPTQ
Bits: 4
Group size: 128
Logical parameter count: 8,674,021,857

Repository layout

This model is stored in a split VibeVoice layout:

root directory: VibeVoice audio and non-decoder weights
decoder-gptq/: quantized Qwen2 decoder weights

Keep this layout intact when downloading or mirroring the repository.

Metadata

The root config.json includes:

vibevoice_metadata
vibevoice_decoder_model_path
vibevoice_decoder_quantization

These fields identify the split decoder path and preserve the logical source-model metadata.

Validation

This GPTQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

outputs remained valid JSON transcript arrays
output similarity to the full model remained high on tested samples

Serving note for vLLM 0.17.x

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the Marlin GPTQ path.

prefer letting vLLM infer the backend from config.json
if you must set it explicitly, use gptq_marlin rather than plain gptq

Note: in current split-VibeVoice testing, this GPTQ export did not show the same VRAM reduction as the AWQ export under vLLM 0.17.x, even when served through the Marlin path. The checkpoint is still published for reproducibility, but AWQ is currently the recommended low-VRAM variant.

Upstream references

Code: https://github.com/microsoft/VibeVoice
Base model: https://huggingface.co/microsoft/VibeVoice-ASR
Report: https://arxiv.org/pdf/2601.18184

Notes

This is a quantized derivative export, not the original upstream checkpoint.
Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.

Downloads last month: 114

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for lemuriandezapada/VibeVoice-ASR-gptq-int4

Base model

microsoft/VibeVoice-ASR

Quantized

(6)

this model

Paper for lemuriandezapada/VibeVoice-ASR-gptq-int4

VIBEVOICE-ASR Technical Report

Paper • 2601.18184 • Published Jan 26 • 23