LLMJapan
/

Qwen2.5-Coder-32B-Instruct_exl3

Text Generation

Model card Files Files and versions

Qwen2.5-Coder-32B-Instruct_exl3 / README.md

LLMJapan's picture

Update README.md

81a17bb verified 12 months ago

|

history blame contribute delete

1.18 kB

	---
	license: apache-2.0
	language:
	- en
	license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE
	base_model:
	- Qwen/Qwen2.5-Coder-32B-Instruct
	pipeline_tag: text-generation
	tags:
	- code
	- chat
	- qwen
	- qwen-coder
	- exl3
	---

	These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release.
	I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2).

	## EXL3 Quantized Models

	[4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw)

	[6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw)

	[8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)

	For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw.
	If you are using these models only for short Auto Completion, 4.0bpw is usable.

	## Credits

	Thanks to excellent work of exllamav3 dev teams.