mHuBERT-147 IPA Linear CTC FT
Fine-tuned English IPA phone-recognition model initialized from
utter-project/mHuBERT-147 and trained with a compact linear CTC head.
This repository contains the full fine-tuned model:
- mHuBERT-147 backbone
- linear CTC head
- audio preprocessor config
- model size: about
94.4Mbackbone parameters +35klinear-head parameters
Training setup:
- initialized from
utter-project/mHuBERT-147 - top
4encoder layers fine-tuned - trained on TIMIT train + Buckeye train
- linear CTC head on top of frame embeddings
Validation results from the fine-tuning run:
- TIMIT TEST:
PER = 0.1012 - Buckeye val:
PER = 0.2082
Notes:
- The output vocabulary is the same IPA set as in
istomin9192/mHuBERT-147-ipa-head, with one extra CTC blank symbol at the last output index.
Minimal loading example:
import json
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel
repo_id = "istomin9192/mHuBERT-147-ipa-linear-ctc-ft"
feature_extractor = AutoFeatureExtractor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()
with open("ipa_map.json", "r", encoding="utf-8") as f:
id2phone = {int(k): v for k, v in json.load(f)["id2phone"].items()}
wav, sr = librosa.load(wav_file, sr=16000, mono=True)
inputs = feature_extractor(wav, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits[0]
pred_ids = logits.argmax(dim=-1).tolist()
blank_id = model.config.architecture["blank_id"]
phones = []
prev = blank_id
for pid in pred_ids:
if pid != blank_id and pid != prev:
phones.append(id2phone[pid])
prev = pid
print(phones)
- Downloads last month
- 65
Model tree for istomin9192/mHuBERT-147-ipa-linear-ctc-ft
Base model
utter-project/mHuBERT-147Dataset used to train istomin9192/mHuBERT-147-ipa-linear-ctc-ft
Space using istomin9192/mHuBERT-147-ipa-linear-ctc-ft 1
Evaluation results
- Phone Error Rate on TIMITtest set self-reported0.101
- Phone Error Rate on Buckeyevalidation set self-reported0.208