metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Omni-7B
tags:
- lora
- qwen2.5-omni
- cantonese
- speech
- multimodal
Qwen2.5-Omni-7B LoRA — mixed (Cantonese)
LoRA adapter fine-tuned on multimodal_yue_benchmark (Cantonese audio + text), speaker all three (hiugaai/hiumaan/wanlung).
Base model
Load with Qwen/Qwen2.5-Omni-7B as model_name_or_path, then load this repo as the PEFT adapter.
Training
- Framework: LLaMA-Factory
- Method: LoRA (r=8), bf16, DeepSpeed ZeRO-2
- Dataset:
wanlung_train/hiumaan_train/hiugaai_train(single-speaker split)
Inference (Transformers + PEFT)
from transformers import AutoProcessor, Qwen2_5OmniForConditionalGeneration
from peft import PeftModel
import torch
base = "Qwen/Qwen2.5-Omni-7B"
adapter = "J017athan/Qwen2.5-Omni-7B-14k-mixed"
processor = AutoProcessor.from_pretrained(base, trust_remote_code=True)
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter)
Or use LLaMA-Factory adapter_name_or_path pointing to this repo.