Instructions to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF", dtype="auto") - llama-cpp-python
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF", filename="Kimi-Linear-48B-A3B-Instruct-abliterated-BPW3.00.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF # Run inference directly in the terminal: ./llama-cli -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Use Docker
docker model run hf.co/ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
- LM Studio
- Jan
- vLLM
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
- SGLang
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Ollama:
ollama run hf.co/ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
- Unsloth Studio
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF to start chatting
- Pi
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Docker Model Runner:
docker model run hf.co/ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
- Lemonade
How to use ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ENOSYS/Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF
Run and chat with the model
lemonade run user.Kimi-Linear-48B-A3B-Instruct-abliterated-1000-v1-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Experimental global target bits‑per‑weight quantization of huihui-ai/Huihui-Kimi-Linear-48B-A3B-Instruct-abliterated
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: tools, math, code, text_en, text_ru.
- Using dataset chunks: 1000.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | SIZE (MiB) | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Median KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|---|---|
| 3.000 | 17574 | 97.54% | 1.099746 ± 0.001198 | 0.703413 ± 0.008959 | 0.122940 ± 0.000426 | 0.066714 | 9.294135 | 2.637652 | -2.015 ± 0.019 % | 10.272 ± 0.039 % |
| 3.250 | 19038 | 98.25% | 1.064443 ± 0.000969 | 0.454454 ± 0.006978 | 0.089130 ± 0.000308 | 0.050624 | 10.323897 | 1.992209 | -1.732 ± 0.016 % | 8.814 ± 0.035 % |
| 3.500 | 20502 | 98.69% | 1.042289 ± 0.000821 | 0.298224 ± 0.005835 | 0.064519 ± 0.000265 | 0.032550 | 8.311751 | 1.765223 | -1.217 ± 0.014 % | 7.481 ± 0.033 % |
| 3.750 | 21966 | 99.04% | 1.030411 ± 0.000697 | 0.214459 ± 0.004989 | 0.047054 ± 0.000201 | 0.022846 | 8.346386 | 1.258301 | -0.654 ± 0.012 % | 6.340 ± 0.031 % |
| 4.000 | 23430 | 99.15% | 1.028018 ± 0.000653 | 0.197583 ± 0.004668 | 0.042084 ± 0.000178 | 0.020843 | 7.031458 | 1.095171 | -0.639 ± 0.011 % | 5.992 ± 0.029 % |
| 4.250 | 24894 | 99.25% | 1.021574 ± 0.000610 | 0.152141 ± 0.004327 | 0.036486 ± 0.000158 | 0.018297 | 6.763760 | 1.009924 | -0.568 ± 0.011 % | 5.619 ± 0.028 % |
| 4.500 | 26358 | 99.50% | 1.012304 ± 0.000495 | 0.086766 ± 0.003506 | 0.024419 ± 0.000105 | 0.012078 | 5.742187 | 0.643191 | -0.204 ± 0.009 % | 4.565 ± 0.023 % |
| 4.750 | 27822 | 99.61% | 1.007470 ± 0.000434 | 0.052677 ± 0.003063 | 0.019040 ± 0.000086 | 0.009403 | 4.860535 | 0.504090 | -0.200 ± 0.008 % | 4.035 ± 0.020 % |
| 5.000 | 29286 | 99.67% | 1.006065 ± 0.000397 | 0.042769 ± 0.002799 | 0.015935 ± 0.000073 | 0.008021 | 3.658825 | 0.425957 | -0.201 ± 0.007 % | 3.731 ± 0.020 % |
- Downloads last month
- 577
We're not able to determine the quantization variants.