VibeVoice-ASR GPTQ INT4

This repository contains a 4-bit GPTQ quantized export of microsoft/VibeVoice-ASR.

Quantization

  • Method: GPTQ
  • Bits: 4
  • Group size: 128
  • Logical parameter count: 8,674,021,857

Repository layout

This model is stored in a split VibeVoice layout:

  • root directory: VibeVoice audio and non-decoder weights
  • decoder-gptq/: quantized Qwen2 decoder weights

Keep this layout intact when downloading or mirroring the repository.

Metadata

The root config.json includes:

  • vibevoice_metadata
  • vibevoice_decoder_model_path
  • vibevoice_decoder_quantization

These fields identify the split decoder path and preserve the logical source-model metadata.

Validation

This GPTQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

  • outputs remained valid JSON transcript arrays
  • output similarity to the full model remained high on tested samples

Serving note for vLLM 0.17.x

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the Marlin GPTQ path.

  • prefer letting vLLM infer the backend from config.json
  • if you must set it explicitly, use gptq_marlin rather than plain gptq

Note: in current split-VibeVoice testing, this GPTQ export did not show the same VRAM reduction as the AWQ export under vLLM 0.17.x, even when served through the Marlin path. The checkpoint is still published for reproducibility, but AWQ is currently the recommended low-VRAM variant.

Upstream references

Notes

  • This is a quantized derivative export, not the original upstream checkpoint.
  • Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
  • Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.
Downloads last month
114
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lemuriandezapada/VibeVoice-ASR-gptq-int4

Quantized
(6)
this model

Paper for lemuriandezapada/VibeVoice-ASR-gptq-int4