HalluGuard

🛡️ HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

🌍 Overview

HalluGuard is a 4B-parameter Small Reasoning Model (SRM) designed as a guardrail for Retrieval-Augmented Generation (RAG) pipelines. Given a document and a claim, HalluGuard reasons exclusively over the document to determine whether the claim is grounded or hallucinated, and produces an evidence-grounded justification by citing relevant passages.

HalluGuard is built on unsloth/Qwen3-4B and fine-tuned using LoRA and ORPO on HalluGuard-Preferences-76k, a synthetic preference dataset for hallucination detection derived from FineWeb.

📖 Publication

This model was introduced in our paper at the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026).

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

⚙️ Model Details

Base model: unsloth/Qwen3-4B
Task: Three-class document-grounded hallucination detection (GROUNDED, HALLUCINATED_INTRINSIC, HALLUCINATED_EXTRINSIC)
Fine-tuning method: LoRA + ORPO
LoRA rank / alpha: 16 / 16
Trained parameters: ~33M (0.81% of full model)
Training data: HalluGuard-Preferences-76k
Context window: 32,768 tokens
Training hardware: 1x NVIDIA H100 PCIe (80GB) - ~16 hours, ~7.35 kWh
Precision: bfloat16

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import json
 
model_name = "lrsbrgrn/HalluGuard-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
 
def create_prompt(document, claim):
    return json.dumps({
      "instructions": [
          "You will be given a document and a claim.",
          "Decide whether the claim is 'GROUNDED', 'HALLUCINATED_INTRINSIC', or 'HALLUCINATED_EXTRINSIC' based ONLY on the document.",
          "Definitions:",
          "  - GROUNDED: The claim is fully supported by the document. All relevant parts are directly verifiable from the document.",
          "  - HALLUCINATED_INTRINSIC: The claim contradicts what the document states or clearly implies.",
          "  - HALLUCINATED_EXTRINSIC: The claim includes information that is not stated or implied in the document and cannot be verified using only the document (it requires external knowledge).",
          "Justification requirements:",
          "  - Your justification MUST be evidence-grounded.",
          "  - Explicitly refer to the relevant parts of the document (by quoting or paraphrasing them).",
          "  - Explain how these parts SUPPORT, CONTRADICT, or FAIL TO SUPPORT the claim.",
          "  - Do NOT use any external knowledge; rely only on the provided document.",
          "Answer format (VERY IMPORTANT):",
          "  - You MUST respond using EXACTLY the following XML structure:",
          "    <answer>",
          "      <classification>CATEGORY</classification>",
          "      <justification>Your reasoning here</justification>",
          "    </answer>",
          "  - CATEGORY must be ONE of: GROUNDED, HALLUCINATED_INTRINSIC, HALLUCINATED_EXTRINSIC.",
          "  - The <justification> must briefly explain your reasoning and cite evidence from the document.",
          "  - Do NOT add any other text before or after the <answer>...</answer> block.",
          "  - Do NOT add any extra tags or attributes.",
      ],
      "document": f"'{document}'",
      "claim": f"'{claim}'",
    })
 
document = "Apple shares hit record highs, briefly valuing the company at $900B, after beating Wall Street forecasts with strong international sales."
claim = "Apple stock hit record, valuing the company at $900B, after beating Wall Street expectations on international sales."
 
prompt = create_prompt(document, claim)
 
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
 
generated_ids = model.generate(**inputs, max_new_tokens=32768, temperature=0.6, top_p=0.95, top_k=20)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Expected Output

<think>
The claim states that Apple stock hit a record, valuing the company at $900B, after beating Wall Street expectations on international sales. The document says "Apple shares hit record highs, briefly valuing the company at $900B, after beating Wall Street forecasts with strong international sales." All three elements of the claim — record high, $900B valuation, and beating Wall Street on international sales — are directly stated in the document. The claim is therefore grounded.
</think>
<answer>
  <classification>GROUNDED</classification>
  <justification>The document states that Apple shares hit record highs, briefly valuing the company at $900B, after beating Wall Street forecasts with strong international sales, which directly supports all elements of the claim.</justification>
</answer>

For deployment, you can use sglang>=0.4.6.post1 or vllm>=0.8.5 or to create an OpenAI-compatible API endpoint:

SGLang:

python -m sglang.launch_server --model-path lrsbrgrn/HalluGuard-Qwen3-4B --reasoning-parser qwen3

vLLM:

vllm serve lrsbrgrn/HalluGuard-Qwen3-4B --enable-reasoning --reasoning-parser deepseek_r1

📚 Citation

If you use HalluGuard in your work, please cite:

@article{bergeron2025halluguard,
  title={HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation},
  author={Bergeron, Loris and Buhnila, Ioana and François, Jérôme and State, Radu},
  journal={arXiv preprint arXiv:2510.00880},
  year={2025}
}

⚖️ Ethical Considerations

HalluGuard is designed as a decision-support tool, not a fully autonomous system. Over-flagging grounded claims may erode user trust, while missed hallucinations can propagate harmful errors downstream. We strongly recommend pairing HalluGuard with human oversight, especially in sensitive domains such as finance, legal, or healthcare.