tecaprovn commited on
Commit
001f00d
·
verified ·
1 Parent(s): 59e1a57

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -1
README.md CHANGED
@@ -15,4 +15,78 @@ tags:
15
  - gguf
16
  - llama.cpp
17
  - inference-optimization
18
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - gguf
16
  - llama.cpp
17
  - inference-optimization
18
+ ---
19
+
20
+ # DeepSeekV4Flash Quantization Repository
21
+
22
+ ![v4-benchmark-2](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/WXhyPJ5E8r3B2p0TO8us6.png)
23
+
24
+ This repository provides scripts and guidelines for quantizing the **DeepSeek V4 Flash** model, enabling reduced model size and optimized inference performance.
25
+
26
+ ![v4-efficiency](https://cdn-uploads.huggingface.co/production/uploads/671ab90d28ec35263e09152f/m4HSN3MmYyW2SHZytAbFE.png)
27
+
28
+ ---
29
+
30
+ ## 🚀 Purpose
31
+ - Reduce model size (BF16 → Q3/Q4/Q5/Q8, etc.)
32
+ - Improve inference speed
33
+ - Enable deployment on limited GPU/CPU resources
34
+
35
+ ---
36
+
37
+ ## 🌍 Languages
38
+ - English (en)
39
+ - Vietnamese (vi)
40
+
41
+ ---
42
+
43
+ ## 🧠 Base Model
44
+ - DeepSeek-V4-Flash (https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)
45
+
46
+ ---
47
+
48
+ ## 📦 Contents
49
+ - Model conversion and quantization scripts
50
+ - Usage examples for llama.cpp / GGUF workflows
51
+ - Common quantization configurations
52
+
53
+ ---
54
+
55
+ ## 🛠️ Requirements
56
+ - Python >= 3.12
57
+ - Latest version of llama.cpp (with GGUF support)
58
+ - HuggingFace Transformers (if converting from HF format)
59
+ - Sufficient RAM/VRAM depending on model size
60
+
61
+ ---
62
+
63
+ ## ⚙️ Example Usage
64
+
65
+ ```bash
66
+ python convert_hf_to_gguf.py --model deepseek-ai/DeepSeek-V4-Flash --outfile models/DeepSeekV4Flash.gguf
67
+
68
+ ./llama-quantize models/DeepSeekV4Flash.gguf Q4_K_M
69
+ ```
70
+
71
+ ---
72
+
73
+ ## 📌 Notes
74
+ - Quantization may require significant system memory depending on model size
75
+ - Some quantization formats may not be compatible with all runtimes or versions
76
+ - Always validate output quality after quantization
77
+
78
+ ---
79
+
80
+ ## 👤 Author
81
+
82
+ - Email: tecaprovn@gmail.com
83
+ - Telegram: https://t.me/tamndx
84
+
85
+ ---
86
+
87
+ ## 📄 License
88
+
89
+ This repository follows the original DeepSeek model license.
90
+
91
+ - Base model: Apache 2.0 (DeepSeek)
92
+ - Only conversion scripts included, no weight modification