J017athan
/

Qwen2.5-Omni-7B-14k-mixed

Model card Files Files and versions

Qwen2.5-Omni-7B-14k-mixed / README.md

J017athan's picture

Update README.md

4a50fc4 verified 28 days ago

|

history blame contribute delete

1.27 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Omni-7B
	tags:
	- lora
	- qwen2.5-omni
	- cantonese
	- speech
	- multimodal
	---

	# Qwen2.5-Omni-7B LoRA — mixed (Cantonese)

	LoRA adapter fine-tuned on multimodal_yue_benchmark (Cantonese audio + text), speaker all three (hiugaai/hiumaan/wanlung).

	## Base model

	Load with [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) as `model_name_or_path`, then load this repo as the PEFT adapter.

	## Training

	- Framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
	- Method: LoRA (r=8), bf16, DeepSpeed ZeRO-2
	- Dataset: `wanlung_train` / `hiumaan_train` / `hiugaai_train` (single-speaker split)

	## Inference (Transformers + PEFT)

	```python
	from transformers import AutoProcessor, Qwen2_5OmniForConditionalGeneration
	from peft import PeftModel
	import torch

	base = "Qwen/Qwen2.5-Omni-7B"
	adapter = "J017athan/Qwen2.5-Omni-7B-14k-mixed"

	processor = AutoProcessor.from_pretrained(base, trust_remote_code=True)
	model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
	base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
	)
	model = PeftModel.from_pretrained(model, adapter)
	```

	Or use LLaMA-Factory `adapter_name_or_path` pointing to this repo.