The Mind of Tashi — micro student (GRPO, GGUF)

The GRPO-trained student exported to GGUF for llama.cpp. Drop-in replacement for the SFT GGUF in the playable Space after an A/B (winning the game is not enough — the mind-scroll prose must hold up). Transformers source: …/mind-of-tashi-micro-grpo.

Build status: this GGUF is built at push time from the GRPO checkpoint — it does not exist as a by-product of training. Use the exact same recipe as the SFT GGUF.

Files (after build)

File Approx size Use
mind-of-tashi-micro-grpo-Q4_K_M.gguf ~256 MB deployed candidate
mind-of-tashi-micro-grpo-f16.gguf ~786 MB zero-loss reference

Build recipe (no compiled binary needed)

  1. Download the GRPO transformers checkpoint with chat_template.jinja (a missing template silently yields a garbage GGUF).
  2. python convert_hf_to_gguf.py <ckpt> --outtype f16 → f16 GGUF.
  3. Quantise via the llama-cpp-python C binding:
    import ctypes, llama_cpp
    p = llama_cpp.llama_model_quantize_default_params()
    p.ftype = 15  # LLAMA_FTYPE_MOSTLY_Q4_K_M
    llama_cpp.llama_model_quantize(b"in-f16.gguf", b"out-Q4_K_M.gguf", ctypes.byref(p))
    
  4. Grade via the format gate through llama-cpp-python (the real deploy path); ship Q4 if it clears ≥15/20 and stays within ~5 ladder points of f16.

⚠️ norm_topk_prob — required for llama.cpp

Inherited norm_topk_prob=true from SFT; llama.cpp's qwen3moe graph hardcodes norm_w=true and a mismatched checkpoint produces garbage on every llama.cpp runtime. (See the SFT GGUF card.)

Usage

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="build-small-hackathon/mind-of-tashi-micro-grpo-gguf",
    filename="mind-of-tashi-micro-grpo-Q4_K_M.gguf",
    n_ctx=4096, n_gpu_layers=0, logits_all=True,
)

Part of the bundle

Game Space · self-play dataset · SFT model + GGUF · OpenEnv gym · GRPO model + GGUF (this) — all under build-small-hackathon/mind-of-tashi-*.

Downloads last month
52
GGUF
Model size
0.4B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/mind-of-tashi-micro-grpo-gguf

Space using build-small-hackathon/mind-of-tashi-micro-grpo-gguf 1

Collection including build-small-hackathon/mind-of-tashi-micro-grpo-gguf

Article mentioning build-small-hackathon/mind-of-tashi-micro-grpo-gguf