--- license: apache-2.0 base_model: Qwen/Qwen2.5-Omni-7B tags: - lora - qwen2.5-omni - cantonese - speech - multimodal --- # Qwen2.5-Omni-7B LoRA — mixed (Cantonese) LoRA adapter fine-tuned on **multimodal_yue_benchmark** (Cantonese audio + text), speaker **all three (hiugaai/hiumaan/wanlung)**. ## Base model Load with [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) as `model_name_or_path`, then load this repo as the PEFT adapter. ## Training - Framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) - Method: LoRA (r=8), bf16, DeepSpeed ZeRO-2 - Dataset: `wanlung_train` / `hiumaan_train` / `hiugaai_train` (single-speaker split) ## Inference (Transformers + PEFT) ```python from transformers import AutoProcessor, Qwen2_5OmniForConditionalGeneration from peft import PeftModel import torch base = "Qwen/Qwen2.5-Omni-7B" adapter = "J017athan/Qwen2.5-Omni-7B-14k-mixed" processor = AutoProcessor.from_pretrained(base, trust_remote_code=True) model = Qwen2_5OmniForConditionalGeneration.from_pretrained( base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) model = PeftModel.from_pretrained(model, adapter) ``` Or use LLaMA-Factory `adapter_name_or_path` pointing to this repo.