LFM2.5-8B-A1B - GGUF Quantized Versions

This repository provides GGUF quantized versions of LiquidAI/LFM2.5-8B-A1B, converted with llama.cpp.

The purpose of this repository is to provide fast, easy-to-use local inference files for llama.cpp, Ollama, LM Studio, Jan, Open WebUI, and llama-cpp-python users.

Model Details

  • Base model: LiquidAI/LFM2.5-8B-A1B
  • Architecture: Transformer
  • Format: GGUF
  • Source license: other
  • Conversion tool: convert_hf_to_gguf.py from llama.cpp
  • Quantization tool: llama-quantize
  • Recommended file: LFM2.5-8B-A1B-Q4_K_M.gguf

Quantized Files

Quant Filename Size SHA256 Notes
FP16 LFM2.5-8B-A1B-FP16.gguf ~15.78 GiB 2a6007558690... Full precision converted GGUF baseline
Q2_K LFM2.5-8B-A1B-Q2_K.gguf ~2.97 GiB 8f23e224429e... Smallest, lowest quality
Q3_K_M LFM2.5-8B-A1B-Q3_K_M.gguf ~3.83 GiB 5a79627fc67f... Small balanced version
Q4_0 LFM2.5-8B-A1B-Q4_0.gguf ~4.51 GiB c0a1d6adfbaa... Simple 4-bit quantization
Q4_K_M LFM2.5-8B-A1B-Q4_K_M.gguf ~4.80 GiB 96dd85418cff... Recommended default for most users
Q5_K_M LFM2.5-8B-A1B-Q5_K_M.gguf ~5.62 GiB bfa79de6abb8... Better quality with moderate size
Q6_K LFM2.5-8B-A1B-Q6_K.gguf ~6.48 GiB 12e71861dee1... High quality
Q8_0 LFM2.5-8B-A1B-Q8_0.gguf ~8.39 GiB 6fb1629c19fd... Near FP16 quality

Validation

Each file was tested with llama-cli for basic load + generation.

Quant Filename Status
FP16 LFM2.5-8B-A1B-FP16.gguf ✅ passed
Q2_K LFM2.5-8B-A1B-Q2_K.gguf ✅ passed
Q3_K_M LFM2.5-8B-A1B-Q3_K_M.gguf ✅ passed
Q4_0 LFM2.5-8B-A1B-Q4_0.gguf ✅ passed
Q4_K_M LFM2.5-8B-A1B-Q4_K_M.gguf ✅ passed
Q5_K_M LFM2.5-8B-A1B-Q5_K_M.gguf ✅ passed
Q6_K LFM2.5-8B-A1B-Q6_K.gguf ✅ passed
Q8_0 LFM2.5-8B-A1B-Q8_0.gguf ✅ passed

Usage

llama.cpp

llama-cli -m LFM2.5-8B-A1B-Q4_K_M.gguf -p "Hello! Introduce yourself briefly."

Older builds may use:

./main -m LFM2.5-8B-A1B-Q4_K_M.gguf -p "Hello! Introduce yourself briefly."

llama.cpp directly from Hugging Face

llama-cli -hf ShahzebKhoso/LFM2.5-8B-A1B-GGUF:Q4_K_M -p "Hello! Introduce yourself briefly."

llama-cpp-python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="ShahzebKhoso/LFM2.5-8B-A1B-GGUF",
    filename="LFM2.5-8B-A1B-Q4_K_M.gguf",
)

llm = Llama(model_path=model_path)

out = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello! Introduce yourself briefly."},
    ],
    max_tokens=128,
)

print(out["choices"][0]["message"]["content"])

Which file should I use?

  • Use Q4_K_M for the best default balance.
  • Use Q5_K_M for better quality.
  • Use Q8_0 if you want near-original quality and have more memory.
  • Use Q2_K or Q3_K_M only when memory is very limited.

Provenance

This repository is a quantized derivative of:

LiquidAI/LFM2.5-8B-A1B

Base model metadata:

revision: 5492b17c7128ec966b5fc661e374ee7edba7423d
pipeline_tag: text-generation
tags: transformers, safetensors, lfm2_moe, text-generation, liquid, lfm2.5, edge, conversational, en, ar, zh, fr, de, ja, ko, es, pt, arxiv:2511.23404, base_model:LiquidAI/LFM2.5-8B-A1B-Base, base_model:finetune:LiquidAI/LFM2.5-8B-A1B-Base
Downloads last month
352
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShahzebKhoso/LFM2.5-8B-A1B-GGUF

Quantized
(45)
this model

Paper for ShahzebKhoso/LFM2.5-8B-A1B-GGUF