Create README.md
Browse files# DeepSeekV4Flash Quantization Repository
This repository provides scripts and guidelines for quantizing the **DeepSeek V4 Flash** model, enabling reduced model size and optimized inference performance.
---
## 🚀 Purpose
- Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.)
- Improve inference speed
- Enable deployment on limited GPU/CPU resources
---
## 🌍 Languages
- English (en)
- Vietnamese (vi)
---
## 🧠 Base Model
- DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)


---
## 📦 Contents
- Model conversion and quantization scripts
- Usage examples for llama.cpp / GGUF workflows
- Common quantization configurations
---
## 🛠️ Requirements
- Python >= 3.12
- Latest version of llama.cpp (with GGUF support)
- HuggingFace Transformers (if converting from HF format)
- Sufficient RAM/VRAM depending on model size
---
## ⚙️ Example Usage
```bash
python convert_hf_to_gguf.py --model deepseek-ai/DeepSeek-V4-Flash --outfile models/DeepSeekV4Flash.gguf
./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M
```
---
## 📌 Notes
- Quantization may require significant system memory depending on model size
- Some quantization formats may not be compatible with all runtimes or versions
- Always validate output quality after quantization
---
## 👤 Author
- Email: tecaprovn@gmail.com
- Telegram: https://t.me/tamndx
---
## 📄 License
This repository follows the original DeepSeek model license.
- Base model: Apache 2.0 (DeepSeek)
- Only conversion scripts included, no weight modification
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
- vi
|
| 7 |
+
base_model:
|
| 8 |
+
- deepseek-ai/DeepSeek-V4-Flash
|
| 9 |
+
tags:
|
| 10 |
+
- deepseek
|
| 11 |
+
- deepseek4
|
| 12 |
+
- deepseekpro
|
| 13 |
+
- llm
|
| 14 |
+
- quantization
|
| 15 |
+
- gguf
|
| 16 |
+
- llama.cpp
|
| 17 |
+
- inference-optimization
|
| 18 |
+
---
|