HuaminChen commited on
Commit
b4dc1f5
·
verified ·
1 Parent(s): 96e67f1

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +78 -75
  2. model.safetensors +1 -1
README.md CHANGED
@@ -1,57 +1,53 @@
1
  ---
2
- language:
3
- - en
4
  license: apache-2.0
5
- library_name: transformers
 
6
  tags:
7
- - hallucination-detection
8
- - modernbert
9
- - rag
10
- - fact-checking
11
- - token-classification
12
- - long-context
13
- - 32k
14
- - amd
15
- - rocm
16
- - mi300x
17
  datasets:
18
- - RAGTruth
19
- - llm-semantic-router/longcontext-haldetect
20
- base_model: llm-semantic-router/modernbert-base-32k
21
  pipeline_tag: token-classification
22
- metrics:
23
- - f1
24
  model-index:
25
- - name: modernbert-base-32k-haldetect
26
- results:
27
- - task:
28
- type: token-classification
29
- name: Hallucination Detection
30
- dataset:
31
- type: RAGTruth
32
- name: RAGTruth Test Set
33
- metrics:
34
- - type: f1
35
- value: 77.49
36
- name: Example-Level F1
37
- - type: f1
38
- value: 51.47
39
- name: Token-Level F1
40
- - task:
41
- type: token-classification
42
- name: Long-Context Hallucination Detection
43
- dataset:
44
- type: llm-semantic-router/longcontext-haldetect
45
- name: Long-Context Benchmark (8K-24K tokens)
46
- metrics:
47
- - type: f1
48
- value: 49.86
49
- name: Hallucination F1
50
  ---
51
 
52
  # 🥬 ModernBERT-base-32k Hallucination Detector
53
 
54
- A hallucination detection model fine-tuned on RAGTruth dataset using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
55
 
56
  ## 🚀 Why 32K Context Matters
57
 
@@ -63,30 +59,27 @@ A hallucination detection model fine-tuned on RAGTruth dataset using extended 32
63
 
64
  ## Performance
65
 
66
- ### Long-Context Benchmark (8K-24K tokens)
67
-
68
- Evaluated on [`llm-semantic-router/longcontext-haldetect`](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
69
-
70
- | Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
71
- |--------|---------------|------------------|-------------|
72
- | **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
73
- | Hallucination Precision | 0.458 | 0.529 | -13% |
74
- | Hallucination Recall | 0.547 | 0.056 | **+877%** |
75
- | **Hallucination F1** | **0.499** | 0.101 | **+393%** |
76
-
77
- **Key Finding**: The 8K model achieves only 5.6% recall because it truncates 95% of samples, losing critical evidence for hallucination detection.
78
-
79
  ### RAGTruth Benchmark (Standard, <3K tokens)
80
 
81
  Evaluated on RAGTruth test set (2,700 samples):
82
 
83
  | Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
84
  |--------|------------|-------------------|---------------------|
85
- | **Example-Level F1** | **77.49%** ✅ | 75.99% | 79.22% |
86
- | Token-Level F1 | 51.47% | 56.27% | - |
87
  | Context Window | **32K** | 8K | 8K |
88
 
89
- ✅ **Matches LettuceDetect BASE** on short documents while supporting **4x longer context**
 
 
 
 
 
 
 
 
 
 
90
 
91
  ## Model Description
92
 
@@ -94,9 +87,10 @@ This model detects hallucinations in LLM-generated text by classifying each toke
94
 
95
  ### Key Features
96
 
97
- - **32K Context Window**: Built on [`llm-semantic-router/modernbert-base-32k`](https://huggingface.co/llm-semantic-router/modernbert-base-32k) with YaRN RoPE scaling
98
  - **Token-Level Classification**: Identifies specific spans that are hallucinated
99
  - **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
 
100
  - **Long Document Support**: Handles legal contracts, financial reports, research papers
101
 
102
  ## Usage
@@ -144,34 +138,42 @@ spans = detector.predict(context, question, answer)
144
 
145
  ## Training Details
146
 
147
- ### Dataset
148
- - **RAGTruth**: ~13,500 samples (QA, Data-to-Text, Summarization)
149
- - Train/Dev/Test split from original RAGTruth
 
 
 
 
 
150
 
151
  ### Configuration
 
152
  ```yaml
153
  base_model: llm-semantic-router/modernbert-base-32k
154
  max_length: 8192
155
- batch_size: 8
156
  learning_rate: 1e-5
157
  epochs: 6
158
- loss: CrossEntropyLoss
159
  scheduler: None (constant LR)
160
- early_stopping_patience: 3
161
  ```
162
 
163
  ### Hardware
 
164
  - **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
165
- - Training time: ~30 minutes (6 epochs, ~10K samples)
166
- - Framework: PyTorch 2.4 + HuggingFace Transformers on ROCm 7.0
167
 
168
  ## When to Use This Model
169
 
170
  | Use Case | Recommended Model |
171
- |----------|------------------|
172
  | Documents > 8K tokens | ✅ **This model** |
173
  | Multi-document RAG | ✅ **This model** |
174
  | Legal/Financial docs | ✅ **This model** |
 
175
  | Short QA (<3K tokens) | Either model works |
176
  | Speed critical | 8K model (faster) |
177
 
@@ -183,15 +185,15 @@ early_stopping_patience: 3
183
 
184
  ## Related Resources
185
 
186
- - **Long-Context Benchmark**: [`llm-semantic-router/longcontext-haldetect`](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) - 3,366 samples, 8K-24K tokens
187
- - **Base Model**: [`llm-semantic-router/modernbert-base-32k`](https://huggingface.co/llm-semantic-router/modernbert-base-32k) - Extended ModernBERT
188
- - **Combined Model**: [`modernbert-base-32k-haldetect-combined`](https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect-combined) - Trained on RAGTruth + HaluEval
189
 
190
  ## Citation
191
 
192
  ```bibtex
193
  @misc{modernbert-32k-haldetect,
194
- title={Scaling Encoder-Based Hallucination Detection to 32K Tokens},
195
  author={LLM Semantic Router Team},
196
  year={2026},
197
  url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
@@ -200,6 +202,7 @@ early_stopping_patience: 3
200
 
201
  ## Acknowledgments
202
 
203
- - Built on [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) framework
204
  - Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
205
  - Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset
 
 
1
  ---
 
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
+ - modernbert
7
+ - hallucination-detection
8
+ - rag
9
+ - fact-checking
10
+ - long-context
11
+ - 32k
12
+ - amd
13
+ - rocm
14
+ - mi300x
 
15
  datasets:
16
+ - llm-semantic-router/longcontext-haldetect
17
+ base_model:
18
+ - llm-semantic-router/modernbert-base-32k
19
  pipeline_tag: token-classification
 
 
20
  model-index:
21
+ - name: modernbert-base-32k-haldetect
22
+ results:
23
+ - task:
24
+ type: token-classification
25
+ name: Hallucination Detection
26
+ dataset:
27
+ name: RAGTruth Test Set
28
+ type: ragtruth
29
+ metrics:
30
+ - name: Example-Level F1
31
+ type: f1
32
+ value: 76.56
33
+ - name: Token-Level F1
34
+ type: f1
35
+ value: 53.77
36
+ - task:
37
+ type: token-classification
38
+ name: Long-Context Hallucination Detection
39
+ dataset:
40
+ name: Long-Context Benchmark (8K-24K tokens)
41
+ type: llm-semantic-router/longcontext-haldetect
42
+ metrics:
43
+ - name: Hallucination F1
44
+ type: f1
45
+ value: 49.86
46
  ---
47
 
48
  # 🥬 ModernBERT-base-32k Hallucination Detector
49
 
50
+ A hallucination detection model fine-tuned on RAGTruth dataset with Data2txt augmentation using extended 32K context ModernBERT. **Specifically designed for long documents that exceed 8K tokens.**
51
 
52
  ## 🚀 Why 32K Context Matters
53
 
 
59
 
60
  ## Performance
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ### RAGTruth Benchmark (Standard, <3K tokens)
63
 
64
  Evaluated on RAGTruth test set (2,700 samples):
65
 
66
  | Metric | This Model | LettuceDetect BASE | LettuceDetect LARGE |
67
  |--------|------------|-------------------|---------------------|
68
+ | **Example-Level F1** | **76.56%** ✅ | 75.99% | 79.22% |
69
+ | Token-Level F1 | 53.77% | 56.27% | - |
70
  | Context Window | **32K** | 8K | 8K |
71
 
72
+ ✅ **Exceeds LettuceDetect BASE** on short documents while supporting **4x longer context**
73
+
74
+ ### Long-Context Benchmark (8K-24K tokens)
75
+
76
+ Evaluated on [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) (337 test samples, avg 17,550 tokens):
77
+
78
+ | Metric | 32K ModernBERT | 8K LettuceDetect | Improvement |
79
+ |--------|----------------|------------------|-------------|
80
+ | **Samples Truncated** | 0 (0%) | 320 (95%) | **-95%** |
81
+ | Hallucination Recall | 0.547 | 0.056 | **+877%** |
82
+ | **Hallucination F1** | **0.499** | 0.101 | **+393%** |
83
 
84
  ## Model Description
85
 
 
87
 
88
  ### Key Features
89
 
90
+ - **32K Context Window**: Built on [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) with YaRN RoPE scaling
91
  - **Token-Level Classification**: Identifies specific spans that are hallucinated
92
  - **RAG Optimized**: Trained on RAGTruth benchmark for RAG applications
93
+ - **Data2txt Augmentation**: Enhanced with DART and E2E datasets for better structured data handling
94
  - **Long Document Support**: Handles legal contracts, financial reports, research papers
95
 
96
  ## Usage
 
138
 
139
  ## Training Details
140
 
141
+ ### Datasets
142
+
143
+ | Dataset | Samples | Task Type | Description |
144
+ |---------|---------|-----------|-------------|
145
+ | **RAGTruth** | 17,790 | QA, Summary, Data2txt | Human-annotated hallucination spans |
146
+ | **DART** | 2,000 | Data2txt | LLM-generated structured data responses |
147
+ | **E2E** | 1,500 | Data2txt | LLM-generated restaurant descriptions |
148
+ | **Total** | 21,290 | Mixed | Balanced task distribution |
149
 
150
  ### Configuration
151
+
152
  ```yaml
153
  base_model: llm-semantic-router/modernbert-base-32k
154
  max_length: 8192
155
+ batch_size: 32
156
  learning_rate: 1e-5
157
  epochs: 6
158
+ loss: CrossEntropyLoss (weighted)
159
  scheduler: None (constant LR)
160
+ early_stopping_patience: 4
161
  ```
162
 
163
  ### Hardware
164
+
165
  - **AMD Instinct MI300X GPU** (192GB HBM3) - Trained entirely on AMD ROCm
166
+ - Training time: ~17 minutes (6 epochs)
167
+ - Framework: PyTorch 2.9 + HuggingFace Transformers on ROCm 7.0
168
 
169
  ## When to Use This Model
170
 
171
  | Use Case | Recommended Model |
172
+ |----------|-------------------|
173
  | Documents > 8K tokens | ✅ **This model** |
174
  | Multi-document RAG | ✅ **This model** |
175
  | Legal/Financial docs | ✅ **This model** |
176
+ | Structured data (tables, lists) | ✅ **This model** |
177
  | Short QA (<3K tokens) | Either model works |
178
  | Speed critical | 8K model (faster) |
179
 
 
185
 
186
  ## Related Resources
187
 
188
+ - **Long-Context Benchmark**: [llm-semantic-router/longcontext-haldetect](https://huggingface.co/datasets/llm-semantic-router/longcontext-haldetect) - 3,366 samples, 8K-24K tokens
189
+ - **Base Model**: [llm-semantic-router/modernbert-base-32k](https://huggingface.co/llm-semantic-router/modernbert-base-32k) - Extended ModernBERT
190
+ - **Combined Model**: [modernbert-base-32k-haldetect-combined](https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect-combined) - Trained on RAGTruth + HaluEval
191
 
192
  ## Citation
193
 
194
  ```bibtex
195
  @misc{modernbert-32k-haldetect,
196
+ title={ModernBERT-32K Hallucination Detector with Data2txt Augmentation},
197
  author={LLM Semantic Router Team},
198
  year={2026},
199
  url={https://huggingface.co/llm-semantic-router/modernbert-base-32k-haldetect}
 
202
 
203
  ## Acknowledgments
204
 
205
+ - Built on [LettuceDetect](https://github.com/KRLabTech/LettuceDetect) framework
206
  - Uses [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture
207
  - Trained on [RAGTruth](https://github.com/ParticleMedia/RAGTruth) dataset
208
+ - Data2txt augmentation from [DART](https://github.com/Yale-LILY/dart) and [E2E](https://github.com/tuetschek/e2e-dataset) datasets
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:978fe689645d0c530a9d9a1ab2f2421f7a07f5b1acdd943f98ce7ca8459053de
3
  size 598439784
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a05c0312e98d364ff9bff9386f7cf015da60218405c09f15c4dc77a84af7f1aa
3
  size 598439784