tecaprovn commited on
Commit
59e1a57
·
verified ·
1 Parent(s): 6ab88fd

Create README.md

Browse files

# DeepSeekV4Flash Quantization Repository

This repository provides scripts and guidelines for quantizing the **DeepSeek V4 Flash** model, enabling reduced model size and optimized inference performance.

---

## 🚀 Purpose
- Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.)
- Improve inference speed
- Enable deployment on limited GPU/CPU resources

---

## 🌍 Languages
- English (en)
- Vietnamese (vi)

---

## 🧠 Base Model
- DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)

![v4-benchmark-2](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/UBU1Is26bS0_EsHiB-EzN.png)


![v4-efficiency](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/qdpeuZlwveFRjYbUZiZJd.png)

---

## 📦 Contents
- Model conversion and quantization scripts
- Usage examples for llama.cpp / GGUF workflows
- Common quantization configurations

---

## 🛠️ Requirements
- Python >= 3.12
- Latest version of llama.cpp (with GGUF support)
- HuggingFace Transformers (if converting from HF format)
- Sufficient RAM/VRAM depending on model size

---

## ⚙️ Example Usage

```bash
python convert_hf_to_gguf.py --model deepseek-ai/DeepSeek-V4-Flash --outfile models/DeepSeekV4Flash.gguf

./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M
```

---

## 📌 Notes
- Quantization may require significant system memory depending on model size
- Some quantization formats may not be compatible with all runtimes or versions
- Always validate output quality after quantization

---

## 👤 Author

- Email: tecaprovn@gmail.com
- Telegram: https://t.me/tamndx

---

## 📄 License

This repository follows the original DeepSeek model license.

- Base model: Apache 2.0 (DeepSeek)
- Only conversion scripts included, no weight modification

Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - vi
7
+ base_model:
8
+ - deepseek-ai/DeepSeek-V4-Flash
9
+ tags:
10
+ - deepseek
11
+ - deepseek4
12
+ - deepseekpro
13
+ - llm
14
+ - quantization
15
+ - gguf
16
+ - llama.cpp
17
+ - inference-optimization
18
+ ---