File size: 1,267 Bytes
d369597 a2b94e0 d369597 4a50fc4 d369597 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ---
license: apache-2.0
base_model: Qwen/Qwen2.5-Omni-7B
tags:
- lora
- qwen2.5-omni
- cantonese
- speech
- multimodal
---
# Qwen2.5-Omni-7B LoRA — mixed (Cantonese)
LoRA adapter fine-tuned on **multimodal_yue_benchmark** (Cantonese audio + text), speaker **all three (hiugaai/hiumaan/wanlung)**.
## Base model
Load with [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) as `model_name_or_path`, then load this repo as the PEFT adapter.
## Training
- Framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
- Method: LoRA (r=8), bf16, DeepSpeed ZeRO-2
- Dataset: `wanlung_train` / `hiumaan_train` / `hiugaai_train` (single-speaker split)
## Inference (Transformers + PEFT)
```python
from transformers import AutoProcessor, Qwen2_5OmniForConditionalGeneration
from peft import PeftModel
import torch
base = "Qwen/Qwen2.5-Omni-7B"
adapter = "J017athan/Qwen2.5-Omni-7B-14k-mixed"
processor = AutoProcessor.from_pretrained(base, trust_remote_code=True)
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter)
```
Or use LLaMA-Factory `adapter_name_or_path` pointing to this repo.
|