Upload folder using huggingface_hub
Browse files- README.md +78 -75
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -1,57 +1,53 @@
|
|
| 1 |
---
|
| 2 |
-
language:
|
| 3 |
-
- en
|
| 4 |
license: apache-2.0
|
| 5 |
-
|
|
|
|
| 6 |
tags:
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
-
- rag
|
| 10 |
-
- fact-checking
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
-
|
| 15 |
-
-
|
| 16 |
-
- mi300x
|
| 17 |
datasets:
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
pipeline_tag: token-classification
|
| 22 |
-
metrics:
|
| 23 |
-
- f1
|
| 24 |
model-index:
|
| 25 |
-
- name: modernbert-base-32k-haldetect
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
---
|
| 51 |
|
| 52 |
# 🥬 ModernBERT-base-32k Hallucination Detector
|
| 53 |
|
| 54 |
-
A hallucination detection model fine-tuned on RAGTruth dataset using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
|
| 55 |
|
| 56 |
## 🚀 Why 32K Context Matters
|
| 57 |
|
|
@@ -63,30 +59,27 @@ A hallucination detection model fine-tuned on RAGTruth dataset using extended 32
|
|
| 63 |
|
| 64 |
## Performance
|
| 65 |
|
| 66 |
-
### Long-Context Benchmark (8K-24K tokens)
|
| 67 |
-
|
| 68 |
-
Evaluated on [`llm-semantic-router/longcontext-haldetect`](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
|
| 69 |
-
|
| 70 |
-
| Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
|
| 71 |
-
|--------|---------------|------------------|-------------|
|
| 72 |
-
| **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
|
| 73 |
-
| Hallucination Precision | 0.458 | 0.529 | -13% |
|
| 74 |
-
| Hallucination Recall | 0.547 | 0.056 | **+877%** |
|
| 75 |
-
| **Hallucination F1** | **0.499** | 0.101 | **+393%** |
|
| 76 |
-
|
| 77 |
-
**Key Finding**: The 8K model achieves only 5.6% recall because it truncates 95% of samples, losing critical evidence for hallucination detection.
|
| 78 |
-
|
| 79 |
### RAGTruth Benchmark (Standard, <3K tokens)
|
| 80 |
|
| 81 |
Evaluated on RAGTruth test set (2,700 samples):
|
| 82 |
|
| 83 |
| Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
|
| 84 |
|--------|------------|-------------------|---------------------|
|
| 85 |
-
| **Example-Level F1** | **
|
| 86 |
-
| Token-Level F1 |
|
| 87 |
| Context Window | **32K** | 8K | 8K |
|
| 88 |
|
| 89 |
-
✅ **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
## Model Description
|
| 92 |
|
|
@@ -94,9 +87,10 @@ This model detects hallucinations in LLM-generated text by classifying each toke
|
|
| 94 |
|
| 95 |
### Key Features
|
| 96 |
|
| 97 |
-
- **32K Context Window**: Built on [
|
| 98 |
- **Token-Level Classification**: Identifies specific spans that are hallucinated
|
| 99 |
- **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
|
|
|
|
| 100 |
- **Long Document Support**: Handles legal contracts, financial reports, research papers
|
| 101 |
|
| 102 |
## Usage
|
|
@@ -144,34 +138,42 @@ spans = detector.predict(context, question, answer)
|
|
| 144 |
|
| 145 |
## Training Details
|
| 146 |
|
| 147 |
-
###
|
| 148 |
-
|
| 149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
### Configuration
|
|
|
|
| 152 |
```yaml
|
| 153 |
base_model: llm-semantic-router/modernbert-base-32k
|
| 154 |
max_length: 8192
|
| 155 |
-
batch_size:
|
| 156 |
learning_rate: 1e-5
|
| 157 |
epochs: 6
|
| 158 |
-
loss: CrossEntropyLoss
|
| 159 |
scheduler: None (constant LR)
|
| 160 |
-
early_stopping_patience:
|
| 161 |
```
|
| 162 |
|
| 163 |
### Hardware
|
|
|
|
| 164 |
- **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
|
| 165 |
-
- Training time: ~
|
| 166 |
-
- Framework: PyTorch 2.
|
| 167 |
|
| 168 |
## When to Use This Model
|
| 169 |
|
| 170 |
| Use Case | Recommended Model |
|
| 171 |
-
|----------|------------------|
|
| 172 |
| Documents > 8K tokens | ✅ **This model** |
|
| 173 |
| Multi-document RAG | ✅ **This model** |
|
| 174 |
| Legal/Financial docs | ✅ **This model** |
|
|
|
|
| 175 |
| Short QA (<3K tokens) | Either model works |
|
| 176 |
| Speed critical | 8K model (faster) |
|
| 177 |
|
|
@@ -183,15 +185,15 @@ early_stopping_patience: 3
|
|
| 183 |
|
| 184 |
## Related Resources
|
| 185 |
|
| 186 |
-
- **Long-Context Benchmark**: [
|
| 187 |
-
- **Base Model**: [
|
| 188 |
-
- **Combined Model**: [
|
| 189 |
|
| 190 |
## Citation
|
| 191 |
|
| 192 |
```bibtex
|
| 193 |
@misc{modernbert-32k-haldetect,
|
| 194 |
-
title={
|
| 195 |
author={LLM Semantic Router Team},
|
| 196 |
year={2026},
|
| 197 |
url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
|
|
@@ -200,6 +202,7 @@ early_stopping_patience: 3
|
|
| 200 |
|
| 201 |
## Acknowledgments
|
| 202 |
|
| 203 |
-
- Built on [LettuceDetect](https://github.com/
|
| 204 |
- Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
|
| 205 |
- Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
tags:
|
| 6 |
+
- modernbert
|
| 7 |
+
- hallucination-detection
|
| 8 |
+
- rag
|
| 9 |
+
- fact-checking
|
| 10 |
+
- long-context
|
| 11 |
+
- 32k
|
| 12 |
+
- amd
|
| 13 |
+
- rocm
|
| 14 |
+
- mi300x
|
|
|
|
| 15 |
datasets:
|
| 16 |
+
- llm-semantic-router/longcontext-haldetect
|
| 17 |
+
base_model:
|
| 18 |
+
- llm-semantic-router/modernbert-base-32k
|
| 19 |
pipeline_tag: token-classification
|
|
|
|
|
|
|
| 20 |
model-index:
|
| 21 |
+
- name: modernbert-base-32k-haldetect
|
| 22 |
+
results:
|
| 23 |
+
- task:
|
| 24 |
+
type: token-classification
|
| 25 |
+
name: Hallucination Detection
|
| 26 |
+
dataset:
|
| 27 |
+
name: RAGTruth Test Set
|
| 28 |
+
type: ragtruth
|
| 29 |
+
metrics:
|
| 30 |
+
- name: Example-Level F1
|
| 31 |
+
type: f1
|
| 32 |
+
value: 76.56
|
| 33 |
+
- name: Token-Level F1
|
| 34 |
+
type: f1
|
| 35 |
+
value: 53.77
|
| 36 |
+
- task:
|
| 37 |
+
type: token-classification
|
| 38 |
+
name: Long-Context Hallucination Detection
|
| 39 |
+
dataset:
|
| 40 |
+
name: Long-Context Benchmark (8K-24K tokens)
|
| 41 |
+
type: llm-semantic-router/longcontext-haldetect
|
| 42 |
+
metrics:
|
| 43 |
+
- name: Hallucination F1
|
| 44 |
+
type: f1
|
| 45 |
+
value: 49.86
|
| 46 |
---
|
| 47 |
|
| 48 |
# 🥬 ModernBERT-base-32k Hallucination Detector
|
| 49 |
|
| 50 |
+
A hallucination detection model fine-tuned on RAGTruth dataset with Data2txt augmentation using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
|
| 51 |
|
| 52 |
## 🚀 Why 32K Context Matters
|
| 53 |
|
|
|
|
| 59 |
|
| 60 |
## Performance
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
### RAGTruth Benchmark (Standard, <3K tokens)
|
| 63 |
|
| 64 |
Evaluated on RAGTruth test set (2,700 samples):
|
| 65 |
|
| 66 |
| Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
|
| 67 |
|--------|------------|-------------------|---------------------|
|
| 68 |
+
| **Example-Level F1** | **76.56%** ✅ | 75.99% | 79.22% |
|
| 69 |
+
| Token-Level F1 | 53.77% | 56.27% | - |
|
| 70 |
| Context Window | **32K** | 8K | 8K |
|
| 71 |
|
| 72 |
+
✅ **Exceeds LettuceDetect BASE** on short documents while supporting **4x longer context**
|
| 73 |
+
|
| 74 |
+
### Long-Context Benchmark (8K-24K tokens)
|
| 75 |
+
|
| 76 |
+
Evaluated on [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
|
| 77 |
+
|
| 78 |
+
| Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
|
| 79 |
+
|--------|----------------|------------------|-------------|
|
| 80 |
+
| **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
|
| 81 |
+
| Hallucination Recall | 0.547 | 0.056 | **+877%** |
|
| 82 |
+
| **Hallucination F1** | **0.499** | 0.101 | **+393%** |
|
| 83 |
|
| 84 |
## Model Description
|
| 85 |
|
|
|
|
| 87 |
|
| 88 |
### Key Features
|
| 89 |
|
| 90 |
+
- **32K Context Window**: Built on [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) with YaRN RoPE scaling
|
| 91 |
- **Token-Level Classification**: Identifies specific spans that are hallucinated
|
| 92 |
- **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
|
| 93 |
+
- **Data2txt Augmentation**: Enhanced with DART and E2E datasets for better structured data handling
|
| 94 |
- **Long Document Support**: Handles legal contracts, financial reports, research papers
|
| 95 |
|
| 96 |
## Usage
|
|
|
|
| 138 |
|
| 139 |
## Training Details
|
| 140 |
|
| 141 |
+
### Datasets
|
| 142 |
+
|
| 143 |
+
| Dataset | Samples | Task Type | Description |
|
| 144 |
+
|---------|---------|-----------|-------------|
|
| 145 |
+
| **RAGTruth** | 17,790 | QA, Summary, Data2txt | Human-annotated hallucination spans |
|
| 146 |
+
| **DART** | 2,000 | Data2txt | LLM-generated structured data responses |
|
| 147 |
+
| **E2E** | 1,500 | Data2txt | LLM-generated restaurant descriptions |
|
| 148 |
+
| **Total** | 21,290 | Mixed | Balanced task distribution |
|
| 149 |
|
| 150 |
### Configuration
|
| 151 |
+
|
| 152 |
```yaml
|
| 153 |
base_model: llm-semantic-router/modernbert-base-32k
|
| 154 |
max_length: 8192
|
| 155 |
+
batch_size: 32
|
| 156 |
learning_rate: 1e-5
|
| 157 |
epochs: 6
|
| 158 |
+
loss: CrossEntropyLoss (weighted)
|
| 159 |
scheduler: None (constant LR)
|
| 160 |
+
early_stopping_patience: 4
|
| 161 |
```
|
| 162 |
|
| 163 |
### Hardware
|
| 164 |
+
|
| 165 |
- **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
|
| 166 |
+
- Training time: ~17 minutes (6 epochs)
|
| 167 |
+
- Framework: PyTorch 2.9 + HuggingFace Transformers on ROCm 7.0
|
| 168 |
|
| 169 |
## When to Use This Model
|
| 170 |
|
| 171 |
| Use Case | Recommended Model |
|
| 172 |
+
|----------|-------------------|
|
| 173 |
| Documents > 8K tokens | ✅ **This model** |
|
| 174 |
| Multi-document RAG | ✅ **This model** |
|
| 175 |
| Legal/Financial docs | ✅ **This model** |
|
| 176 |
+
| Structured data (tables, lists) | ✅ **This model** |
|
| 177 |
| Short QA (<3K tokens) | Either model works |
|
| 178 |
| Speed critical | 8K model (faster) |
|
| 179 |
|
|
|
|
| 185 |
|
| 186 |
## Related Resources
|
| 187 |
|
| 188 |
+
- **Long-Context Benchmark**: [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) - 3,366 samples, 8K-24K tokens
|
| 189 |
+
- **Base Model**: [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) - Extended ModernBERT
|
| 190 |
+
- **Combined Model**: [modernbert-base-32k-haldetect-combined](https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect-combined) - Trained on RAGTruth + HaluEval
|
| 191 |
|
| 192 |
## Citation
|
| 193 |
|
| 194 |
```bibtex
|
| 195 |
@misc{modernbert-32k-haldetect,
|
| 196 |
+
title={ModernBERT-32K Hallucination Detector with Data2txt Augmentation},
|
| 197 |
author={LLM Semantic Router Team},
|
| 198 |
year={2026},
|
| 199 |
url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
|
|
|
|
| 202 |
|
| 203 |
## Acknowledgments
|
| 204 |
|
| 205 |
+
- Built on [LettuceDetect](https://github.com/KRLabTech/LettuceDetect) framework
|
| 206 |
- Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
|
| 207 |
- Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset
|
| 208 |
+
- Data2txt augmentation from [DART](https://github.com/Yale-LILY/dart) and [E2E](https://github.com/tuetschek/e2e-dataset) datasets
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 598439784
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a05c0312e98d364ff9bff9386f7cf015da60218405c09f15c4dc77a84af7f1aa
|
| 3 |
size 598439784
|