klemenk's picture
Upload WavCoch checkpoint
c8ca075 verified
metadata
license: apache-2.0
tags:
  - audio
  - speech
  - tokenizer
  - vocoder
  - wavcoch
library_name: transformers

WavCochCausalV64000100M

WavCoch is a causal waveform-to-cochleagram tokenizer by Greta Tuckute and Klemen Kotar.

Model Details

Parameter Value
Parameters ~93.05M
Window Size 1001
Hop Length 80
Encoder Dim 1536
Vocabulary Size 64000
Includes Vocoder False

Usage

from transformers import AutoModel

wavcoch = AutoModel.from_pretrained(
    "TuKoResearch/WavCochCausalV64000100M",
    trust_remote_code=True,
)

codes = wavcoch.quantize(waveform_tensor)
coch = wavcoch.decode(codes)
embeddings = wavcoch(
    input_values=waveform_tensor,
    output_hidden_states=True,
    sampling_rate=16000,
).hidden_states[0]

Notes

This repo contains the WavCoch tokenizer/autoencoder only. Audio decoding requires a vocoder-enabled checkpoint.

When called with output_hidden_states=True, WavCoch exposes a single hidden-state layer: the post-FSQ projected embedding sequence used for direct probing.