tecaprovn
/

deepseek-v4-flash-gguf

inference-optimization

Model card Files Files and versions

tecaprovn commited on 3 days ago

Commit

001f00d

·

verified ·

1 Parent(s): 59e1a57

Update README.md

Files changed (1) hide show

README.md +75 -1

README.md CHANGED Viewed

@@ -15,4 +15,78 @@ tags:
 - gguf
 - llama.cpp
 - inference-optimization
----

 - gguf
 - llama.cpp
 - inference-optimization
+---
+# DeepSeekV4Flash Quantization Repository
+![v4-benchmark-2](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/WXhyPJ5E8r3B2p0TO8us6.png)
+This repository provides scripts and guidelines for quantizing the **DeepSeek V4 Flash** model, enabling reduced model size and optimized inference performance.
+![v4-efficiency](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/m4HSN3MmYyW2SHZytAbFE.png)
+---
+## 🚀 Purpose
+- Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.)
+- Improve inference speed
+- Enable deployment on limited GPU/CPU resources
+---
+## 🌍 Languages
+- English (en)
+- Vietnamese (vi)
+---
+## 🧠 Base Model
+- DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)
+---
+## 📦 Contents
+- Model conversion and quantization scripts
+- Usage examples for llama.cpp / GGUF workflows
+- Common quantization configurations
+---
+## 🛠️ Requirements
+- Python >= 3.12
+- Latest version of llama.cpp (with GGUF support)
+- HuggingFace Transformers (if converting from HF format)
+- Sufficient RAM/VRAM depending on model size
+---
+## ⚙️ Example Usage
+```bash
+python convert_hf_to_gguf.py   --model deepseek-ai/DeepSeek-V4-Flash   --outfile models/DeepSeekV4Flash.gguf
+./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M
+```
+---
+## 📌 Notes
+- Quantization may require significant system memory depending on model size
+- Some quantization formats may not be compatible with all runtimes or versions
+- Always validate output quality after quantization
+---
+## 👤 Author
+- Email: tecaprovn@gmail.com
+- Telegram: https://t.me/tamndx
+---
+## 📄 License
+This repository follows the original DeepSeek model license.
+- Base model: Apache 2.0 (DeepSeek)
+- Only conversion scripts included, no weight modification