kubelm-qwen2.5-1.5b-v1 โ€” Q4_K_M GGUF

The edge rung of the kubelm tier ladder: a 1.5B-parameter K8sGPT MCP tool-use specialist, fine-tuned with QLoRA on Qwen2.5-1.5B-Instruct and quantized to Q4_K_M (~986 MB on disk, ~1.1 GB serving RAM) for CPU-only deployment. The first kubelm release, and the only tier that also runs under Ollama โ€” its Qwen2.5 backbone loads cleanly where the Qwen3.5 tiers (0.8B / 2B) currently require llama.cpp.

Tier ladder: kubelm-qwen3.5-0.8b-v1 (ultra-edge) ยท this (edge) ยท kubelm-qwen3.5-2b-v1 (edge+, the headline deployable). Each tier is judged within its own resource bracket, not against the one above.

TL;DR

On the 35-scenario v0.3 evaluation library, served at temperature 0:

metric Qwen2.5-1.5B (base) kubelm-qwen2.5-1.5b-v1 qwen2.5-7b (ref) kubelm-qwen3.5-2b-v1 (ref)
conclusion_rubric_passed 9 / 35 29 / 35 28 / 35 32 / 35
reference_calls_passed 7 / 35 27 / 35 28 / 35 32 / 35
fabrications (grounding v2) 65 21 8 3
schema_passed (tool-call) 31 / 35 32 / 35 34 / 35 35 / 35
termination_label == complete 10 / 35 33 / 35 33 / 35 35 / 35
narrative_inconsistencies 0 0 0 0

Honest read. Fine-tuning transforms the base: rubric 9 โ†’ 29, completion 10 โ†’ 33, fabrications 65 โ†’ 21. On reasoning it edges qwen2.5-7b (rubric 29 vs 28) and ties it on completion (33 vs 33) at roughly 1/5 the footprint. The weak spot is real and not hidden: fabrications (21) are higher than the 7B (8) and the 2B (3) โ€” the edge tier reaches the right conclusion reliably but is looser about asserting only tool-grounded facts than the larger tiers. Zero tool-name and zero argument hallucinations across all 35 trajectories.

If grounding strictness matters more than footprint, step up to the 2B. Full rows: eval/results/summaries/shape-d-2026-05-27.json.

Quickstart

Ollama (works for this tier)

Ollama's Qwen2.5 template parses OpenAI-shape tool_calls out of the box:

hf download rbentaarit/kubelm-qwen2.5-1.5b-v1 kubelm-edge.Q4_K_M.gguf --local-dir .

cat > Modelfile <<'EOF'
FROM ./kubelm-edge.Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER num_ctx 16384
EOF
ollama create kubelm-qwen2.5-1.5b -f Modelfile

llama.cpp (llama-server)

brew install llama.cpp   # or build from https://github.com/ggml-org/llama.cpp
hf download rbentaarit/kubelm-qwen2.5-1.5b-v1 kubelm-edge.Q4_K_M.gguf --local-dir .

llama-server \
    -m kubelm-edge.Q4_K_M.gguf \
    --host 127.0.0.1 --port 8088 \
    --jinja \
    -c 16384 \
    -ngl 99   # drop or set 0 on a CPU-only Linux box

Two notes that are load-bearing:

  • --jinja uses the model's embedded Qwen2.5 chat template, including its <tool_call> rendering. Without it, tool-use breaks.
  • -c 16384 matches the model's max_seq_length. Long-trajectory investigations accumulate ~9โ€“11 K tokens of history; a smaller context errors with HTTP 400 request exceeds the available context size.

Unlike the Qwen3.5 tiers, this model is not a thinking model โ€” there is no enable_thinking / no-think serving step to worry about.

In production, drive this through the K8sGPT MCP server and the kubelm eval harness so the model calls real tools against a real cluster.

Intended use

  • Tool-use specialist for K8sGPT MCP investigations on CPU-only hardware (M-series Macs, modest Linux boxes), where an Ollama-native GGUF is convenient.
  • Local component of agentic K8s diagnosis pipelines where the destructive-action layer is handled by K8sGPT's operator + Mutation CR policy gates (the model proposes; the operator gates).

Out of scope

  • Snapshot diagnosis from raw cluster YAML. Trained on multi-step tool-use trajectories, not Q&A pairs over frozen cluster state.
  • Safety / refusal decisions on destructive operations. That layer is architectural in the K8sGPT ecosystem; the model is trained for reliability properties, not behavioral refusal.
  • Direct kubectl usage. The tools list is K8sGPT MCP-specific.
  • General K8s domain knowledge questions outside the K8sGPT MCP tool surface.

Training

  • Base model: Qwen2.5-1.5B-Instruct.
  • Dataset: the v0 cut of rbentaarit/kubelm-seed-v0 โ€” gpt-5.4-authored multi-step trajectories plus mechanical variants, filtered to records that are review-accepted and pass the conclusion-rubric and tool-call schema checks.
  • Method: QLoRA (nf4 + double-quant), rank 32 / alpha 64, target modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj. LoRA adapter included in this repo under adapter/.
  • Schedule: 2 epochs, batch 8 ร— grad-accum 2 (eff. 16), lr 2e-4 cosine, warmup 3%, max_seq_length 16384, seed 42. Assistant-only loss.
  • Full config: training/configs/kubelm-edge-v0.yaml; recipe: training/sft.py.

Evaluation

Methodology and eval harness: github.com/rbentaarit/kubelm/eval. Each scenario boots a fresh kind cluster, seeds the failure mode, brings up a real K8sGPT MCP server against it, then runs the model through the trajectory loop and grades the result. Mocked MCP servers are not used at any stage.

Versioning

  • K8sGPT version pin: 0.4.32. Tool surface and MCP error shapes change between K8sGPT releases; quality numbers above are not guaranteed against other versions.
  • MCP protocol version: 2025-03-26.

Known issues

  • Fabrication rate (21) is the softest metric. This tier is looser about asserting only tool-grounded facts than the 2B (3) or qwen2.5-7b (8). If your application is sensitive to over-confident grounding, prefer the 2B.
  • No native tool-call format other than OpenAI Chat Completions.

License

Apache 2.0. The base model is Qwen2.5-1.5B-Instruct (Apache 2.0). The training corpus is CC BY 4.0.

Citation

@misc{kubelm_qwen25_1_5b_v1,
  title  = {kubelm-qwen2.5-1.5b-v1},
  author = {Ramzi Ben Taarit and contributors},
  year   = {2026},
  url    = {https://huggingface.co/rbentaarit/kubelm-qwen2.5-1.5b-v1},
  note   = {QLoRA on Qwen2.5-1.5B-Instruct; trained against K8sGPT v0.4.32 MCP trajectories}
}

Source code

All training, evaluation, and dataset-construction code: github.com/rbentaarit/kubelm.

Downloads last month
146
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rbentaarit/kubelm-qwen2.5-1.5b-v1

Quantized
(199)
this model