| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen2.5-Omni-7B |
| tags: |
| - lora |
| - qwen2.5-omni |
| - cantonese |
| - speech |
| - multimodal |
| --- |
| |
| # Qwen2.5-Omni-7B LoRA — mixed (Cantonese) |
|
|
| LoRA adapter fine-tuned on **multimodal_yue_benchmark** (Cantonese audio + text), speaker **all three (hiugaai/hiumaan/wanlung)**. |
|
|
| ## Base model |
|
|
| Load with [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) as `model_name_or_path`, then load this repo as the PEFT adapter. |
|
|
| ## Training |
|
|
| - Framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) |
| - Method: LoRA (r=8), bf16, DeepSpeed ZeRO-2 |
| - Dataset: `wanlung_train` / `hiumaan_train` / `hiugaai_train` (single-speaker split) |
|
|
| ## Inference (Transformers + PEFT) |
|
|
| ```python |
| from transformers import AutoProcessor, Qwen2_5OmniForConditionalGeneration |
| from peft import PeftModel |
| import torch |
| |
| base = "Qwen/Qwen2.5-Omni-7B" |
| adapter = "J017athan/Qwen2.5-Omni-7B-14k-mixed" |
| |
| processor = AutoProcessor.from_pretrained(base, trust_remote_code=True) |
| model = Qwen2_5OmniForConditionalGeneration.from_pretrained( |
| base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True |
| ) |
| model = PeftModel.from_pretrained(model, adapter) |
| ``` |
|
|
| Or use LLaMA-Factory `adapter_name_or_path` pointing to this repo. |
|
|