mHuBERT-147 IPA Linear CTC FT

Fine-tuned English IPA phone-recognition model initialized from utter-project/mHuBERT-147 and trained with a compact linear CTC head.

This repository contains the full fine-tuned model:

mHuBERT-147 backbone
linear CTC head
audio preprocessor config
model size: about 94.4M backbone parameters + 35k linear-head parameters

Training setup:

initialized from utter-project/mHuBERT-147
top 4 encoder layers fine-tuned
trained on TIMIT train + Buckeye train
linear CTC head on top of frame embeddings

Validation results from the fine-tuning run:

TIMIT TEST: PER = 0.1012
Buckeye val: PER = 0.2082

Notes:

The output vocabulary is the same IPA set as in istomin9192/mHuBERT-147-ipa-head, with one extra CTC blank symbol at the last output index.

Minimal loading example:

import json
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel

repo_id = "istomin9192/mHuBERT-147-ipa-linear-ctc-ft"

feature_extractor = AutoFeatureExtractor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()

with open("ipa_map.json", "r", encoding="utf-8") as f:
    id2phone = {int(k): v for k, v in json.load(f)["id2phone"].items()}

wav, sr = librosa.load(wav_file, sr=16000, mono=True)
inputs = feature_extractor(wav, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits[0]

pred_ids = logits.argmax(dim=-1).tolist()
blank_id = model.config.architecture["blank_id"]
phones = []
prev = blank_id
for pid in pred_ids:
    if pid != blank_id and pid != prev:
        phones.append(id2phone[pid])
    prev = pid

print(phones)

Downloads last month: 65

Model tree for istomin9192/mHuBERT-147-ipa-linear-ctc-ft

Base model

utter-project/mHuBERT-147

Finetuned

(15)

this model

Dataset used to train istomin9192/mHuBERT-147-ipa-linear-ctc-ft

Space using istomin9192/mHuBERT-147-ipa-linear-ctc-ft 1

Evaluation results

Phone Error Rate on TIMIT
test set self-reported

0.101
Phone Error Rate on Buckeye
validation set self-reported

0.208