llm-semantic-router
/

modernbert-base-32k-haldetect

@@ -1,57 +1,53 @@
 ---
-language:
-- en
 license: apache-2.0
-library_name: transformers
 tags:
-- hallucination-detection
-- modernbert
-- rag
-- fact-checking
-- token-classification
-- long-context
-- 32k
-- amd
-- rocm
-- mi300x
 datasets:
-- RAGTruth
-- llm-semantic-router/longcontext-haldetect
-base_model: llm-semantic-router/modernbert-base-32k
 pipeline_tag: token-classification
-metrics:
-- f1
 model-index:
-- name: modernbert-base-32k-haldetect
-  results:
-  - task:
-      type: token-classification
-      name: Hallucination Detection
-    dataset:
-      type: RAGTruth
-      name: RAGTruth Test Set
-    metrics:
-    - type: f1
-      value: 77.49
-      name: Example-Level F1
-    - type: f1
-      value: 51.47
-      name: Token-Level F1
-  - task:
-      type: token-classification
-      name: Long-Context Hallucination Detection
-    dataset:
-      type: llm-semantic-router/longcontext-haldetect
-      name: Long-Context Benchmark (8K-24K tokens)
-    metrics:
-    - type: f1
-      value: 49.86
-      name: Hallucination F1
 ---
 # 🥬 ModernBERT-base-32k Hallucination Detector
-A hallucination detection model fine-tuned on RAGTruth dataset using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
 ## 🚀 Why 32K Context Matters
@@ -63,30 +59,27 @@ A hallucination detection model fine-tuned on RAGTruth dataset using extended 32
 ## Performance
-### Long-Context Benchmark (8K-24K tokens)
-Evaluated on [`llm-semantic-router/longcontext-haldetect`](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
-| Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
-|--------|---------------|------------------|-------------|
-| **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
-| Hallucination Precision | 0.458 | 0.529 | -13% |
-| Hallucination Recall | 0.547 | 0.056 | **+877%** |
-| **Hallucination F1** | **0.499** | 0.101 | **+393%** |
-**Key Finding**: The 8K model achieves only 5.6% recall because it truncates 95% of samples, losing critical evidence for hallucination detection.
 ### RAGTruth Benchmark (Standard, <3K tokens)
 Evaluated on RAGTruth test set (2,700 samples):
 | Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
 |--------|------------|-------------------|---------------------|
-| **Example-Level F1** | **77.49%** ✅ | 75.99% | 79.22% |
-| Token-Level F1 | 51.47% | 56.27% | - |
 | Context Window | **32K** | 8K | 8K |
-✅ **Matches LettuceDetect BASE** on short documents while supporting **4x longer context**
 ## Model Description
@@ -94,9 +87,10 @@ This model detects hallucinations in LLM-generated text by classifying each toke
 ### Key Features
-- **32K Context Window**: Built on [`llm-semantic-router/modernbert-base-32k`](https://huggingface.co/llm-semantic-router/modernbert-base-32k) with YaRN RoPE scaling
 - **Token-Level Classification**: Identifies specific spans that are hallucinated
 - **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
 - **Long Document Support**: Handles legal contracts, financial reports, research papers
 ## Usage
@@ -144,34 +138,42 @@ spans = detector.predict(context, question, answer)
 ## Training Details
-### Dataset
-- **RAGTruth**: ~13,500 samples (QA, Data-to-Text, Summarization)
-- Train/Dev/Test split from original RAGTruth
 ### Configuration
 ```yaml
 base_model: llm-semantic-router/modernbert-base-32k
 max_length: 8192
-batch_size: 8
 learning_rate: 1e-5
 epochs: 6
-loss: CrossEntropyLoss
 scheduler: None (constant LR)
-early_stopping_patience: 3
 ```
 ### Hardware
 - **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
-- Training time: ~30 minutes (6 epochs, ~10K samples)
-- Framework: PyTorch 2.4 + HuggingFace Transformers on ROCm 7.0
 ## When to Use This Model
 | Use Case | Recommended Model |
-|----------|------------------|
 | Documents > 8K tokens | ✅ **This model** |
 | Multi-document RAG | ✅ **This model** |
 | Legal/Financial docs | ✅ **This model** |
 | Short QA (<3K tokens) | Either model works |
 | Speed critical | 8K model (faster) |
@@ -183,15 +185,15 @@ early_stopping_patience: 3
 ## Related Resources
-- **Long-Context Benchmark**: [`llm-semantic-router/longcontext-haldetect`](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) - 3,366 samples, 8K-24K tokens
-- **Base Model**: [`llm-semantic-router/modernbert-base-32k`](https://huggingface.co/llm-semantic-router/modernbert-base-32k) - Extended ModernBERT
-- **Combined Model**: [`modernbert-base-32k-haldetect-combined`](https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect-combined) - Trained on RAGTruth + HaluEval
 ## Citation
 ```bibtex
 @misc{modernbert-32k-haldetect,
-  title={Scaling Encoder-Based Hallucination Detection to 32K Tokens},
   author={LLM Semantic Router Team},
   year={2026},
   url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
@@ -200,6 +202,7 @@ early_stopping_patience: 3
 ## Acknowledgments
-- Built on [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) framework
 - Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
 - Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset

 ---
 license: apache-2.0
+language:
+  - en
 tags:
+  - modernbert
+  - hallucination-detection
+  - rag
+  - fact-checking
+  - long-context
+  - 32k
+  - amd
+  - rocm
+  - mi300x
 datasets:
+  - llm-semantic-router/longcontext-haldetect
+base_model:
+  - llm-semantic-router/modernbert-base-32k
 pipeline_tag: token-classification
 model-index:
+  - name: modernbert-base-32k-haldetect
+    results:
+      - task:
+          type: token-classification
+          name: Hallucination Detection
+        dataset:
+          name: RAGTruth Test Set
+          type: ragtruth
+        metrics:
+          - name: Example-Level F1
+            type: f1
+            value: 76.56
+          - name: Token-Level F1
+            type: f1
+            value: 53.77
+      - task:
+          type: token-classification
+          name: Long-Context Hallucination Detection
+        dataset:
+          name: Long-Context Benchmark (8K-24K tokens)
+          type: llm-semantic-router/longcontext-haldetect
+        metrics:
+          - name: Hallucination F1
+            type: f1
+            value: 49.86
 ---
 # 🥬 ModernBERT-base-32k Hallucination Detector
+A hallucination detection model fine-tuned on RAGTruth dataset with Data2txt augmentation using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
 ## 🚀 Why 32K Context Matters
 ## Performance
 ### RAGTruth Benchmark (Standard, <3K tokens)
 Evaluated on RAGTruth test set (2,700 samples):
 | Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
 |--------|------------|-------------------|---------------------|
+| **Example-Level F1** | **76.56%** ✅ | 75.99% | 79.22% |
+| Token-Level F1 | 53.77% | 56.27% | - |
 | Context Window | **32K** | 8K | 8K |
+✅ **Exceeds LettuceDetect BASE** on short documents while supporting **4x longer context**
+### Long-Context Benchmark (8K-24K tokens)
+Evaluated on [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
+| Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
+|--------|----------------|------------------|-------------|
+| **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
+| Hallucination Recall | 0.547 | 0.056 | **+877%** |
+| **Hallucination F1** | **0.499** | 0.101 | **+393%** |
 ## Model Description
 ### Key Features
+- **32K Context Window**: Built on [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) with YaRN RoPE scaling
 - **Token-Level Classification**: Identifies specific spans that are hallucinated
 - **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
+- **Data2txt Augmentation**: Enhanced with DART and E2E datasets for better structured data handling
 - **Long Document Support**: Handles legal contracts, financial reports, research papers
 ## Usage
 ## Training Details
+### Datasets
+| Dataset | Samples | Task Type | Description |
+|---------|---------|-----------|-------------|
+| **RAGTruth** | 17,790 | QA, Summary, Data2txt | Human-annotated hallucination spans |
+| **DART** | 2,000 | Data2txt | LLM-generated structured data responses |
+| **E2E** | 1,500 | Data2txt | LLM-generated restaurant descriptions |
+| **Total** | 21,290 | Mixed | Balanced task distribution |
 ### Configuration
 ```yaml
 base_model: llm-semantic-router/modernbert-base-32k
 max_length: 8192
+batch_size: 32
 learning_rate: 1e-5
 epochs: 6
+loss: CrossEntropyLoss (weighted)
 scheduler: None (constant LR)
+early_stopping_patience: 4
 ```
 ### Hardware
 - **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
+- Training time: ~17 minutes (6 epochs)
+- Framework: PyTorch 2.9 + HuggingFace Transformers on ROCm 7.0
 ## When to Use This Model
 | Use Case | Recommended Model |
+|----------|-------------------|
 | Documents > 8K tokens | ✅ **This model** |
 | Multi-document RAG | ✅ **This model** |
 | Legal/Financial docs | ✅ **This model** |
+| Structured data (tables, lists) | ✅ **This model** |
 | Short QA (<3K tokens) | Either model works |
 | Speed critical | 8K model (faster) |
 ## Related Resources
+- **Long-Context Benchmark**: [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) - 3,366 samples, 8K-24K tokens
+- **Base Model**: [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) - Extended ModernBERT
+- **Combined Model**: [modernbert-base-32k-haldetect-combined](https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect-combined) - Trained on RAGTruth + HaluEval
 ## Citation
 ```bibtex
 @misc{modernbert-32k-haldetect,
+  title={ModernBERT-32K Hallucination Detector with Data2txt Augmentation},
   author={LLM Semantic Router Team},
   year={2026},
   url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
 ## Acknowledgments
+- Built on [LettuceDetect](https://github.com/KRLabTech/LettuceDetect) framework
 - Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
 - Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset
+- Data2txt augmentation from [DART](https://github.com/Yale-LILY/dart) and [E2E](https://github.com/tuetschek/e2e-dataset) datasets

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:978fe689645d0c530a9d9a1ab2f2421f7a07f5b1acdd943f98ce7ca8459053de
 size 598439784

 version https://git-lfs.github.com/spec/v1
+oid sha256:a05c0312e98d364ff9bff9386f7cf015da60218405c09f15c4dc77a84af7f1aa
 size 598439784