--- language: en license: apache-2.0 library_name: mlx tags: - hrm - fine-tuning - qlora - sft - hierachical-reasoning model-index: - name: hrm-1b-sft-v6 results: - task: type: text-generation metrics: - type: weighted-score value: 61.7 name: v3 Benchmark Weighted Score datasets: - glaiveai/glaive-function-calling-v2 - iamtarun/code_instructions_120k_alpaca - openai/gsm8k - yahma/alpaca-cleaned - HuggingFaceTB/cosmopedia-100k --- # HRM-Text-1B SFT QLoRA Adapters (v6) QLoRA fine-tuned adapters for `Aryagm/HRM-Text-1B-MLX-4bit`, a 1B-parameter hierarchical reasoning model with a recurrent architecture (H=2, L=3 = 8 passes per token). Trained entirely on an 8GB M2 Mac Mini. **Part of the [Sid Local LLM Benchmark v3](https://github.com/blackdeerbits/sid-local-llm-bench).** ## Results | Metric | Base Model | Fine-Tuned (v6) | Delta | |--------|-----------|-----------------|-------| | Overall Weighted Score | 58.3% | **61.7%** | **+3.4%** | | AGENT (tool calling) | 10% | **60%** | **+50pp** | | CODE | 70% | 60% | -10pp | | HALL (hallucination resistance) | 62% | **75%** | **+13pp** | | INST (instruction following) | 40% | **60%** | **+20pp** | | CTX (context reasoning) | 75% | 75% | 0 | ## Files - `adapters.npz` — final v6 QLoRA weights (~22MB) - `best_adapters.npz` — best-validation checkpoint (identical to final) ## Training Details | Parameter | Value | |-----------|-------| | Base model | Aryagm/HRM-Text-1B-MLX-4bit (4-bit MXFP4) | | Method | QLoRA (rank=16, alpha=32) | | Target layers | Attention projections only (gqkv_proj, o_proj) | | Training samples | 2,000 | | Iterations | 2,000 | | Batch size | 1 (gradient accumulation) | | Learning rate | 2e-5 | | Optimizer | AdamW | | Loss | Masked response loss (answer tokens only) | | Hardware | Apple M2 Mac Mini, 8GB unified memory | ### Dataset Composition | Source | % | Count | |--------|---|-------| | glaiveai/glaive-function-calling-v2 (AGENT) | 20% | 400 | | iamtarun/code_instructions_120k_alpaca (CODE) | 30% | 600 | | yahma/alpaca-cleaned (INST) | 25% | 500 | | openai/gsm8k (MATH) | 15% | 300 | | HuggingFaceTB/cosmopedia-100k (REPLAY) | 10% | 200 | ## Usage ```python import mlx.core as mx from mlx_hrm_text.runner import HRMTextGenerator from mlx_hrm_text.model import HrmTextForCausalLM, set_metal_swiglu from pathlib import Path set_metal_swiglu(True) # Load base model gen = HRMTextGenerator( model_dir="Aryagm/HRM-Text-1B-MLX-4bit", temperature=0.3, ) # Freeze and apply LoRA gen.model.freeze() # Patch attention projections from mlx.nn import Module class LoRALinear(Module): def __init__(self, linear, r=16, alpha=32): super().__init__() self.linear = linear self.linear.freeze() self.r = r self.scale = alpha / r out_f, in_f = linear.weight.shape self.lora_a = mx.random.normal((in_f, r)) / r self.lora_b = mx.zeros((r, out_f)) def __call__(self, x): dtype = x.dtype return self.linear(x) + (x @ self.lora_a.astype(dtype) @ self.lora_b.astype(dtype)) * self.scale def apply_lora(module): for block in module.layers: block.attn.gqkv_proj = LoRALinear(block.attn.gqkv_proj) block.attn.o_proj = LoRALinear(block.attn.o_proj) apply_lora(gen.model.model.H_module) apply_lora(gen.model.model.L_module) # Load adapters flat = mx.load("adapters.npz") # (Full recursive population in run_hrm_lora_bench.py on GitHub) result = gen.generate("Write a Python function to reverse a string.") print(result.text) ``` ## Links - **Full code + benchmark data:** [github.com/blackdeerbits/sid-local-llm-bench](https://github.com/blackdeerbits/sid-local-llm-bench) - **HRM fine-tuning article:** [reddeerinv.com/ai/hrm-fine-tuning-journey/](https://reddeerinv.com/ai/hrm-fine-tuning-journey/) - **Base model:** [huggingface.co/Aryagm/HRM-Text-1B-MLX-4bit](https://huggingface.co/Aryagm/HRM-Text-1B-MLX-4bit) ## Citation ```bibtex @misc{reddeer2026hrm, author = {the_red_deer}, title = {The HRM Fine-Tuning Journey}, year = {2026}, url = {https://reddeerinv.com/ai/hrm-fine-tuning-journey/} } ```