tecaprovn
/

deepseek-v4-flash-gguf

inference-optimization

Model card Files Files and versions

deepseek-v4-flash-gguf / README.md

tecaprovn's picture

Update README.md

001f00d verified 4 days ago

|

history blame contribute delete

2.01 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	- vi
	base_model:
	- deepseek-ai/DeepSeek-V4-Flash
	tags:
	- deepseek
	- deepseek4
	- deepseekpro
	- llm
	- quantization
	- gguf
	- llama.cpp
	- inference-optimization
	---

	# DeepSeekV4Flash Quantization Repository

	![v4-benchmark-2](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/WXhyPJ5E8r3B2p0TO8us6.png)

	This repository provides scripts and guidelines for quantizing the DeepSeek V4 Flash model, enabling reduced model size and optimized inference performance.

	![v4-efficiency](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/m4HSN3MmYyW2SHZytAbFE.png)

	---

	## 🚀 Purpose
	- Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.)
	- Improve inference speed
	- Enable deployment on limited GPU/CPU resources

	---

	## 🌍 Languages
	- English (en)
	- Vietnamese (vi)

	---

	## 🧠 Base Model
	- DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)

	---

	## 📦 Contents
	- Model conversion and quantization scripts
	- Usage examples for llama.cpp / GGUF workflows
	- Common quantization configurations

	---

	## 🛠️ Requirements
	- Python >= 3.12
	- Latest version of llama.cpp (with GGUF support)
	- HuggingFace Transformers (if converting from HF format)
	- Sufficient RAM/VRAM depending on model size

	---

	## ⚙️ Example Usage

	```bash
	python convert_hf_to_gguf.py --model deepseek-ai/DeepSeek-V4-Flash --outfile models/DeepSeekV4Flash.gguf

	./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M
	```

	---

	## 📌 Notes
	- Quantization may require significant system memory depending on model size
	- Some quantization formats may not be compatible with all runtimes or versions
	- Always validate output quality after quantization

	---

	## 👤 Author

	- Email: tecaprovn@gmail.com
	- Telegram: https://t.me/tamndx

	---

	## 📄 License

	This repository follows the original DeepSeek model license.

	- Base model: Apache 2.0 (DeepSeek)
	- Only conversion scripts included, no weight modification