MisoTTS int8 (BigBlueCeiling)

A weight-only int8 quantization of BigBlueCeiling/MisoTTS-bf16, produced with torchao (int8_weight_only). Only the backbone/decoder Linear layers are quantized; the embeddings, output heads, and projection stay bf16.

Experimental. Weight-only int8; bf16 remains the reference.

What it is for

Lowering the hardware floor. Quantization here is a memory lever, not a speed one: MisoTTS decodes one frame at a time, and those tiny per-step matmuls cannot feed the GPU's low-precision tensor cores, so int8 dequantizes to bf16 for the matmul. You get the VRAM saving, not a throughput win.

  • Fits: ~16 GB VRAM cards (RTX 4060 Ti 16G, 4070 Ti Super, A4000, ...)
  • Quality: Quality-preserving: mean CER 0.11, WER 0.14, UTMOS 3.96 - statistically even with bf16 (CER 0.10 / WER 0.15 / UTMOS 3.94).

Use

This checkpoint is a torch.save'd torchao state_dict (model.pt). The serving core in the MisoTTS repo pulls it automatically when GPU-sense detects a matching VRAM tier. To load it directly:

from generator import load_miso_8b  # from the MisoTTS repo
gen = load_miso_8b("cuda", model_path_or_repo_id="BigBlueCeiling/MisoTTS-int8",
                   prequantized=True)

Requires torch>=2.7 and a matching torchao (loading unpickles a torchao tensor subclass, so weights_only=False is used; load only checkpoints you trust).

Model and original inference code are MisoLabs' work; see the upstream license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BigBlueCeiling/MisoTTS-int8

Base model

MisoLabs/MisoTTS
Finetuned
(2)
this model