Update hallucination_detection/README.md (#9)
Browse files- Update hallucination_detection/README.md (a2c62abb331e9b8e675286e5846fc5e2da54223e)
- Update hallucination_detection/README.md (ee2fff961d8fc097067d9b486e57eb0d92472a30)
Co-authored-by: Chulaka Gunasekara <cguna@users.noreply.huggingface.co>
hallucination_detection/README.md
CHANGED
|
@@ -7,111 +7,138 @@ library_name: peft
|
|
| 7 |
library_name: transformers
|
| 8 |
---
|
| 9 |
|
| 10 |
-
#
|
| 11 |
|
| 12 |
## Model Summary
|
| 13 |
|
| 14 |
-
This is a RAG-specific
|
| 15 |
|
| 16 |
-
We
|
| 17 |
-
|
| 18 |
-
</br>
|
| 19 |
|
| 20 |
- **Developer:** IBM Research
|
| 21 |
-
- **Model type:** LoRA adapter for [ibm-granite/granite-
|
| 22 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 23 |
|
| 24 |
## Intended use
|
| 25 |
-
This is a
|
| 26 |
|
| 27 |
> [!TIP]
|
| 28 |
-
> Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through [
|
| 29 |
|
| 30 |
-
**Intrinsic input**: The hallucination detection intrinsic takes as input
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
**Intrinsic output**: The output of the hallucination detection intrinsic
|
| 33 |
|
| 34 |
-
**Going from input to output**: When calling the intrinsic through
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
##
|
| 37 |
|
| 38 |
-
|
| 39 |
-
Here is some example code for calling this intrinsic from Mellea:
|
| 40 |
-
```
|
| 41 |
-
from mellea.backends.huggingface import LocalHFBackend
|
| 42 |
-
from mellea.stdlib.base import ChatContext, Document
|
| 43 |
-
from mellea.stdlib.chat import Message
|
| 44 |
-
from mellea.stdlib.intrinsics import rag
|
| 45 |
-
import json
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-2b-instruct")
|
| 49 |
-
context = (
|
| 50 |
-
ChatContext()
|
| 51 |
-
.add(Message("assistant", "Hello there, how can I help you?"))
|
| 52 |
-
.add(Message("user", "Tell me about some yellow fish."))
|
| 53 |
-
)
|
| 54 |
-
|
| 55 |
-
assistant_response = "Purple bumble fish are yellow. Green bumble fish are also yellow."
|
| 56 |
-
|
| 57 |
-
documents = [
|
| 58 |
-
Document(
|
| 59 |
-
doc_id="1",
|
| 60 |
-
text="The only type of fish that is yellow is the purple bumble fish.",
|
| 61 |
-
)
|
| 62 |
-
]
|
| 63 |
|
| 64 |
-
|
| 65 |
-
print(f"Result of hallucination check: {json.dumps(result, indent=2)}")
|
| 66 |
-
```
|
| 67 |
|
|
|
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
|
| 73 |
-
|
| 74 |
|
| 75 |
-
|
| 76 |
-
This process resulted in ~50K data instances, which were used to train the LoRA adapter.
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
| 83 |
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
-
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
|
|
|
|
|
|
|
| 92 |
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
-
## Evaluation
|
| 96 |
|
| 97 |
-
We
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
The
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
-
## Model Card
|
| 116 |
|
| 117 |
[Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)
|
|
|
|
| 7 |
library_name: transformers
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Hallucination Detection Intrinsic
|
| 11 |
|
| 12 |
## Model Summary
|
| 13 |
|
| 14 |
+
This is a RAG-specific intrinsic fine-tuned for the hallucination detection task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the intrinsic generates faithfulness likelihood and explanation for each sentence in the last assistant response.
|
| 15 |
|
| 16 |
+
We have created two implementations of the intrinsic as LoRA adapters trained over granite-4.0-micro and gpt-oss-20b, respectively. This is the model card for the LoRA adapter trained over gpt-oss-20b. The model card for the LoRA adapter trained over granite-4.0-micro can be found [here](https://huggingface.co/ibm-granite/granitelib-rag-r1.0/blob/main/hallucination_detection/README.md).
|
|
|
|
|
|
|
| 17 |
|
| 18 |
- **Developer:** IBM Research
|
| 19 |
+
- **Model type:** LoRA adapter for [ibm-granite/granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro)
|
| 20 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 21 |
|
| 22 |
## Intended use
|
| 23 |
+
This is a hallucination detection intrinsic that gives the ability to identify faithfulness likelihood for each sentence in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
|
| 24 |
|
| 25 |
> [!TIP]
|
| 26 |
+
> Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through the [Mellea](https://mellea.ai) framework, which wraps the model with a tailored I/O processor, enabling a friendlier development interface. We next describe the input/output of the hallucination detection intrinsic when invoked through Mellea.
|
| 27 |
|
| 28 |
+
**Intrinsic input**: The hallucination detection intrinsic takes as input the following:
|
| 29 |
+
- _Conversation:_ A list of conversational turns ending with the last user question, encoded as a list of user/assistant messages.
|
| 30 |
+
- _Assistant response:_ The assistant response to the last user question, which is also the response for which faithfulness likelihood will be generated, provided as a string.
|
| 31 |
+
- _Documents:_ A list of documents from which the faithfulness likelihood should be drawn, encoded as a collection of Document objects.
|
| 32 |
|
| 33 |
+
**Intrinsic output**: The output of the hallucination detection intrinsic contains the faithfulness likelihood for each sentence in the last assistant response. The output is a JSON array, whose items include the text and begin/end of a response span together with the text,faithfulness likelihood of the response sentence, and the explanation for the faithfulness likelihood.
|
| 34 |
|
| 35 |
+
**Going from input to output**: When calling the intrinsic through Mellea, the framework internally performs multiple steps to transform the intrinsic input to the corresponding output. While you do not have to explicitly invoke these steps, we next provide a brief overview of this process. Given an input to the hallucination detection intrinsic, Mellea performs the following tasks:
|
| 36 |
+
- _Convert user input to the appropriate format expected by the underlying hallucination detection model._ This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction.
|
| 37 |
+
- _Call underlying hallucination detection model for inference._ The model generates output using a compact representation consisting of sentence IDs in the last assistant response and documents.
|
| 38 |
+
- _Convert model output to final output._ The low-level raw model output is converted to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly JSON format ready for consumption by downstream applications.
|
| 39 |
|
| 40 |
+
## Example
|
| 41 |
|
| 42 |
+
You can find below an example of the input and corresponding output of the hallucination detection intrinsic:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
### Input
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
**Conversation:**
|
| 47 |
|
| 48 |
+
Assistant: Hello there, how can I help you?
|
| 49 |
|
| 50 |
+
User: Tell me about some yellow fish.
|
| 51 |
|
| 52 |
+
**Assistant response:**
|
| 53 |
|
| 54 |
+
Purple bumble fish are yellow. Green bumble fish are also yellow.
|
|
|
|
| 55 |
|
| 56 |
+
**Documents:**
|
| 57 |
|
| 58 |
+
The only type of fish that is yellow is the purple bumble fish.
|
| 59 |
|
| 60 |
+
### Output
|
| 61 |
|
| 62 |
+
```json
|
| 63 |
+
[
|
| 64 |
+
{
|
| 65 |
+
"response_begin": 0,
|
| 66 |
+
"response_end": 31,
|
| 67 |
+
"response_text": "Purple bumble fish are yellow. ",
|
| 68 |
+
"faithfulness_likelihood": 0.7280580899614959,
|
| 69 |
+
"explanation": "This sentence makes a factual claim about the color of purple bumble fish. The document states 'The only type of fish that is yellow is the purple bumble fish.' This directly supports the claim in the sentence."
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"response_begin": 31,
|
| 73 |
+
"response_end": 65,
|
| 74 |
+
"response_text": "Green bumble fish are also yellow.",
|
| 75 |
+
"faithfulness_likelihood": 0.08655915695596529,
|
| 76 |
+
"explanation": "This sentence makes a factual claim about the color of green bumble fish. However, the document does not mention green bumble fish at all. Therefore, this claim cannot be verified from the provided context."
|
| 77 |
+
}
|
| 78 |
+
]
|
| 79 |
+
```
|
| 80 |
|
| 81 |
+
## Quickstart
|
| 82 |
|
| 83 |
+
The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework. For code snippets demonstrating how to use this and other intrinsics, please refer to the [Mellea intrinsics examples](https://github.com/generative-computing/mellea/tree/main/docs/examples/intrinsics).
|
| 84 |
|
| 85 |
+
## Evaluation
|
| 86 |
|
| 87 |
+
We evaluated the hallucination detection intrinsic on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the hallucination detection intrinsic and the methods reported in the RAGTruth paper.
|
| 88 |
+
The results are shown in the table below.
|
| 89 |
|
| 90 |
|
| 91 |
+
| Model | Precision | Recall | F1 |
|
| 92 |
+
|---|---|---|---|
|
| 93 |
+
| GPT 4o mini (prompted) | 46.8 | 59.6 | 52.4 |
|
| 94 |
+
| GPT 4o (prompted) | 49.5 | 60.1 | 54.3 |
|
| 95 |
+
| Granite 4.0-micro (prompted) | 35.1 | 44.2 | 39.1 |
|
| 96 |
+
| GPT-OSS-20b (prompted) | 42.4 | 53.8 | 47.4 |
|
| 97 |
+
| GPT-OSS-120b (prompted) | 44.1 | 54.1 | 48.6 |
|
| 98 |
+
| Granite 4.0-micro-LoRA (Hallucination detection intrinsic) | 56.8 | 75.1 | 64.7 |
|
| 99 |
+
| GPT-OSS-20b-LoRA (Hallucination detection intrinsic) | 59.9 | 78.4 | 67.9 |
|
| 100 |
|
|
|
|
| 101 |
|
| 102 |
+
We observe that both hallucination detection intrinsics (LoRA adapters) perform better not only than the corresponding base models prompted out of the box but also better than bigger models.
|
| 103 |
|
| 104 |
+
## Training Details
|
| 105 |
|
| 106 |
+
The hallucination detection intrinsic was trained on synthetically-generated datasets. The process of generating the training data consisted of two main steps:
|
| 107 |
+
- _Multi-turn RAG conversation generation:_ Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpora. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500)
|
| 108 |
+
- _Hallucination detection:_ For creating the faithfulness scores and explanations for the responses, we used a multi-step synthetic data generation pipeline.
|
| 109 |
+
This process resulted in ~50K data instances, which were used to train the LoRA adapter.
|
| 110 |
|
| 111 |
|
| 112 |
+
The resulting data instances were used to train the hallucination detection intrinsic.
|
| 113 |
+
|
| 114 |
+
### Training Data
|
| 115 |
+
|
| 116 |
+
The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
|
| 117 |
+
- [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
| 118 |
+
- [QuAC](https://huggingface.co/datasets/allenai/quac)
|
| 119 |
+
|
| 120 |
+
### Adapter Details
|
| 121 |
+
|
| 122 |
+
| Property | LoRA |
|
| 123 |
+
|---|---|
|
| 124 |
+
| **Base Model** | ibm-granite/granite-4.0-micro |
|
| 125 |
+
| **PEFT Type** | LORA |
|
| 126 |
+
| **Rank (r)** | 16 |
|
| 127 |
+
| **Alpha** | 32 |
|
| 128 |
+
| **Target Modules** | q_proj, k_proj, v_proj, o_proj |
|
| 129 |
+
|
| 130 |
+
**Infrastructure:**
|
| 131 |
+
We trained the hallucination detection granite-4.0-micro LoRA adapter on IBM's Vela cluster using 8 A100 GPUs.
|
| 132 |
+
|
| 133 |
+
**Ethical Considerations & Limitations:**
|
| 134 |
+
The model's outputs are not guaranteed to be factually accurate or complete. All outputs should be independently validated before use in decision-making or downstream applications. The model has been trained and evaluated on English data only.
|
| 135 |
+
|
| 136 |
+
## Resources
|
| 137 |
|
| 138 |
+
- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
|
| 139 |
+
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
|
| 140 |
+
- 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite/granite-guardian/tree/main/cookbooks
|
| 141 |
|
| 142 |
+
## Model Card Authors
|
| 143 |
|
| 144 |
[Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)
|