| --- |
| license: apache-2.0 |
| language: |
| - en |
| - zh |
| - vi |
| base_model: |
| - deepseek-ai/DeepSeek-V4-Flash |
| tags: |
| - deepseek |
| - deepseek4 |
| - deepseekpro |
| - llm |
| - quantization |
| - gguf |
| - llama.cpp |
| - inference-optimization |
| --- |
| |
| # DeepSeekV4Flash Quantization Repository |
|
|
|  |
|
|
| This repository provides scripts and guidelines for quantizing the **DeepSeek V4 Flash** model, enabling reduced model size and optimized inference performance. |
|
|
|  |
|
|
| --- |
|
|
| ## 🚀 Purpose |
| - Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.) |
| - Improve inference speed |
| - Enable deployment on limited GPU/CPU resources |
|
|
| --- |
|
|
| ## 🌍 Languages |
| - English (en) |
| - Vietnamese (vi) |
|
|
| --- |
|
|
| ## 🧠 Base Model |
| - DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) |
|
|
| --- |
|
|
| ## 📦 Contents |
| - Model conversion and quantization scripts |
| - Usage examples for llama.cpp / GGUF workflows |
| - Common quantization configurations |
|
|
| --- |
|
|
| ## 🛠️ Requirements |
| - Python >= 3.12 |
| - Latest version of llama.cpp (with GGUF support) |
| - HuggingFace Transformers (if converting from HF format) |
| - Sufficient RAM/VRAM depending on model size |
|
|
| --- |
|
|
| ## ⚙️ Example Usage |
|
|
| ```bash |
| python convert_hf_to_gguf.py --model deepseek-ai/DeepSeek-V4-Flash --outfile models/DeepSeekV4Flash.gguf |
| |
| ./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M |
| ``` |
|
|
| --- |
|
|
| ## 📌 Notes |
| - Quantization may require significant system memory depending on model size |
| - Some quantization formats may not be compatible with all runtimes or versions |
| - Always validate output quality after quantization |
|
|
| --- |
|
|
| ## 👤 Author |
|
|
| - Email: tecaprovn@gmail.com |
| - Telegram: https://t.me/tamndx |
|
|
| --- |
|
|
| ## 📄 License |
|
|
| This repository follows the original DeepSeek model license. |
|
|
| - Base model: Apache 2.0 (DeepSeek) |
| - Only conversion scripts included, no weight modification |