worthdoing/Mixtral-8x7B-Instruct-v0.1-GGUF
Text Generation • 47B • Updated • 295
None defined yet.
We provide high-quality GGUF quantizations of the best open-source language models, optimized for local inference on Apple Silicon Macs.
We select the best small general-purpose models and quantize them using llama.cpp with carefully chosen quantization levels. Every model is tested on Apple Silicon hardware before release.
Our focus:
| Model | Parameters | Q4_K_M | Q5_K_M | Q8_0 |
|---|---|---|---|---|
| Qwen2.5-7B-Instruct-GGUF | 7B | 4.4 GB | 5.1 GB | 7.5 GB |
| Mistral-7B-Instruct-v0.3-GGUF | 7B | 4.1 GB | 4.8 GB | 7.2 GB |
| Phi-4-mini-GGUF | 3.8B | 2.3 GB | 2.6 GB | 3.8 GB |
| Qwen2.5-3B-Instruct-GGUF | 3B | 1.8 GB | 2.1 GB | 3.1 GB |
| SmolLM2-1.7B-Instruct-GGUF | 1.7B | 1.0 GB | 1.1 GB | 1.7 GB |
| Type | Bits per Weight | Best For |
|---|---|---|
| Q4_K_M | ~4.6 bpw | Recommended - Best quality/size ratio for everyday use |
| Q5_K_M | ~5.3 bpw | Higher quality with minimal size increase |
| Q8_0 | ~8.0 bpw | Near-original quality for maximum accuracy |
# Download a GGUF file, then:
cat > Modelfile <<'EOF'
FROM ./qwen2.5-7b-instruct-Q4_K_M-worthdoing.gguf
EOF
ollama create qwen2.5-7b -f Modelfile
ollama run qwen2.5-7b
llama-cli -m qwen2.5-7b-instruct-Q4_K_M-worthdoing.gguf -p "Your prompt" -ngl 99
Download any GGUF file and import it directly into LM Studio.
| RAM | Recommended Models |
|---|---|
| 8 GB | SmolLM2-1.7B (any quant), Qwen2.5-3B Q4_K_M/Q5_K_M |
| 16 GB | Any 3-4B model (any quant), 7B models Q4_K_M |
| 32 GB+ | Any model, any quantization |
Worth Doing AI is focused on making high-quality AI accessible for local, private use. All quantizations are performed with llama.cpp and verified on Apple Silicon hardware.
Contact: admin@worthdoing.ai