Update hallucination_detection/README.md (#9)

- Update hallucination_detection/README.md (a2c62abb331e9b8e675286e5846fc5e2da54223e)
- Update hallucination_detection/README.md (ee2fff961d8fc097067d9b486e57eb0d92472a30)

Co-authored-by: Chulaka Gunasekara <cguna@users.noreply.huggingface.co>

Files changed (1) hide show

hallucination_detection/README.md +93 -66

hallucination_detection/README.md CHANGED Viewed

@@ -7,111 +7,138 @@ library_name: peft
 library_name: transformers
 ---
-# Intrinsics for Hallucination Detection
 ## Model Summary
-This is a RAG-specific family of intrinsics fine-tuned for the hallucination detection task. Given a multi-turn conversation between a user and an AI assistant, ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a hallucination label (faithful/partial/unfaithful/NA) for each sentence in the assistant response.
-We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
-</br>
 - **Developer:** IBM Research
-- **Model type:** LoRA adapter for [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) and [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Intended use
-This is a family of hallucination detection intrinsics that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
 > [!TIP]
-> Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output). We next describe the input/output of the hallucination detection intrinsics when invoked through granite-common.
-**Intrinsic input**: The hallucination detection intrinsic takes as input an OpenAI-compatible chat completion request. This request includes: a list of conversation turns that ends with the assistant’s response (the response to be checked for hallucinations) and a list of reference documents that the final assistant response should be grounded on. See the code snippets in the Quickstart Example section below for examples of how to format the chat completion request as a JSON object.
-**Intrinsic output**: The output of the hallucination detection intrinsic is formatted as the result of the original chat completion request containing the hallucinations detected for the last assistant response. The hallucinations are provided in the form of a JSON array, whose items include the text and begin/end of a response span (sentence) together with the text, faithfulness_likelihood of the response sentence, and the explanation for the faithfulness_likelihood.
-**Going from input to output**: When calling the intrinsic through granite-common one should follow the steps below to transform the intrinsic input to the corresponding output. These steps are also exemplified in the code snippets included in the Quickstart Example section below. Given an input chat completion request, the request should be  passed to the corresponding input processor (also referred to as IntrinsicsRewriter) provided by granite-common. The input processor converts the request to the appropriate format expected by the underlying hallucination detection model. This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction. The input processor's result should then be passed to the underlying hallucination detection model for inference. The model identifies hallucinations using a compact representation consisting of sentence IDs in the last assistant response and documents. This output should finally be passed to the appropriate output processor (also referred to as IntrinsicsResultProcessor) provided by granite-common. The output processor converts the low-level raw model output to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly format ready for consumption by downstream applications.
-## Quickstart Example
-The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework.
-Here is some example code for calling this intrinsic from Mellea:
-```
-from mellea.backends.huggingface import LocalHFBackend
-from mellea.stdlib.base import ChatContext, Document
-from mellea.stdlib.chat import Message
-from mellea.stdlib.intrinsics import rag
-import json
-backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-2b-instruct")
-context = (
-    ChatContext()
-    .add(Message("assistant", "Hello there, how can I help you?"))
-    .add(Message("user", "Tell me about some yellow fish."))
-)
-assistant_response = "Purple bumble fish are yellow. Green bumble fish are also yellow."
-documents = [
-    Document(
-        doc_id="1",
-        text="The only type of fish that is yellow is the purple bumble fish.",
-    )
-]
-result = rag.flag_hallucinated_content(assistant_response, documents, context, backend)
-print(f"Result of hallucination check: {json.dumps(result, indent=2)}")
-```
-## Training Details
-The process of generating the training data for the hallucination detection intrinsic consisted of two main steps:
--  **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpus. For details on the RAG conversation generation process, please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
--  **Faithfulness label generation:** For creating the faithfulness labels for the responses, we used a multi-step synthetic hallucination label and reasoning generation pipeline.
-This process resulted in ~50K data instances, which were used to train the LoRA adapter.
-### Training Data
-The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
-- [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
-- [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
-- [QuAC](https://huggingface.co/datasets/allenai/quac)
-## Evaluation
-We evaluated the LoRA adapter on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the LoRA adapter and the methods reported in the RAGTruth paper. The responses that obtain a faithfulness labels `partial` or `unfaithful` for at least one sentence are considered as hallucinated responses.
-The results are shown in the table below. The results for the baselines are extracted from the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) paper.
-| Model | Precision | Recall | F1 |
-|---|---|---|---|
-| GPT 4o mini (prompted) | 46.8 | 59.6 | 52.4 |
-| GPT 4o (prompted) | 49.5 | 60.1 | 54.3 |
-| gpt-4-turbo (prompted) | 33.2 | 90.6 | 45.6 |
-| [SelfCheckGPT](https://aclanthology.org/2023.emnlp-main.557.pdf) | 35.0 | 58.0 | 43.7 |
-| [LMvLM](https://aclanthology.org/2023.emnlp-main.778.pdf) | 18.7 | 76.9 | 30.1 |
-| Granite 3.3-2b_hallucination-detection_LoRA | 55.8 | 74.9 | 63.9 |
-| Granite 3.3-8b_hallucination-detection_LoRA | 58.1 | 77.6 | 66.5 |
-## Model Card Author
 [Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)

 library_name: transformers
 ---
+# Hallucination Detection Intrinsic
 ## Model Summary
+This is a RAG-specific intrinsic fine-tuned for the hallucination detection task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the intrinsic generates faithfulness likelihood and explanation for each sentence in the last assistant response.
+We have created two implementations of the intrinsic as LoRA adapters trained over granite-4.0-micro and gpt-oss-20b, respectively. This is the model card for the LoRA adapter trained over gpt-oss-20b. The model card for the LoRA adapter trained over granite-4.0-micro can be found [here](https://huggingface.co/ibm-granite/granitelib-rag-r1.0/blob/main/hallucination_detection/README.md).
 - **Developer:** IBM Research
+- **Model type:** LoRA adapter for [ibm-granite/granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro)
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Intended use
+This is a hallucination detection intrinsic that gives the ability to identify faithfulness likelihood for each sentence in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
 > [!TIP]
+> Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through the [Mellea](https://mellea.ai) framework, which wraps the model with a tailored I/O processor, enabling a friendlier development interface. We next describe the input/output of the hallucination detection intrinsic when invoked through Mellea.
+**Intrinsic input**: The hallucination detection intrinsic takes as input the following:
+- _Conversation:_ A list of conversational turns ending with the last user question, encoded as a list of user/assistant messages.
+- _Assistant response:_ The assistant response to the last user question, which is also the response for which faithfulness likelihood will be generated, provided as a string.
+- _Documents:_ A list of documents from which the faithfulness likelihood should be drawn, encoded as a collection of Document objects.
+**Intrinsic output**: The output of the hallucination detection intrinsic contains the faithfulness likelihood for each sentence in the last assistant response. The output is a JSON array, whose items include the text and begin/end of a response span together with the text,faithfulness likelihood of the response sentence, and the explanation for the faithfulness likelihood.
+**Going from input to output**: When calling the intrinsic through Mellea, the framework internally performs multiple steps to transform the intrinsic input to the corresponding output. While you do not have to explicitly invoke these steps, we next provide a brief overview of this process. Given an input to the hallucination detection intrinsic, Mellea performs the following tasks:
+- _Convert user input to the appropriate format expected by the underlying hallucination detection model._ This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction.
+- _Call underlying hallucination detection model for inference._ The model generates output using a compact representation consisting of sentence IDs in the last assistant response and documents.
+- _Convert model output to final output._ The low-level raw model output is converted to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly JSON format ready for consumption by downstream applications.
+## Example
+You can find below an example of the input and corresponding output of the hallucination detection intrinsic:
+### Input
+**Conversation:**
+Assistant: Hello there, how can I help you?
+User: Tell me about some yellow fish.
+**Assistant response:**
+Purple bumble fish are yellow. Green bumble fish are also yellow.
+**Documents:**
+The only type of fish that is yellow is the purple bumble fish.
+### Output
+```json
+[
+    {
+    "response_begin": 0,
+    "response_end": 31,
+    "response_text": "Purple bumble fish are yellow. ",
+    "faithfulness_likelihood": 0.7280580899614959,
+    "explanation": "This sentence makes a factual claim about the color of purple bumble fish. The document states 'The only type of fish that is yellow is the purple bumble fish.' This directly supports the claim in the sentence."
+  },
+  {
+    "response_begin": 31,
+    "response_end": 65,
+    "response_text": "Green bumble fish are also yellow.",
+    "faithfulness_likelihood": 0.08655915695596529,
+    "explanation": "This sentence makes a factual claim about the color of green bumble fish. However, the document does not mention green bumble fish at all. Therefore, this claim cannot be verified from the provided context."
+  }
+]
+```
+## Quickstart
+The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework. For code snippets demonstrating how to use this and other intrinsics, please refer to the [Mellea intrinsics examples](https://github.com/generative-computing/mellea/tree/main/docs/examples/intrinsics).
+## Evaluation
+We evaluated the hallucination detection intrinsic on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the hallucination detection intrinsic and the methods reported in the RAGTruth paper.
+The results are shown in the table below.
+| Model | Precision | Recall | F1 |
+|---|---|---|---|
+| GPT 4o mini (prompted) | 46.8 | 59.6 | 52.4 |
+| GPT 4o (prompted) | 49.5 | 60.1 | 54.3 |
+| Granite 4.0-micro (prompted) | 35.1 | 44.2 | 39.1 |
+| GPT-OSS-20b (prompted) | 42.4 | 53.8 | 47.4 |
+| GPT-OSS-120b (prompted) | 44.1 | 54.1 | 48.6 |
+| Granite 4.0-micro-LoRA (Hallucination detection intrinsic) | 56.8 | 75.1 | 64.7 |
+| GPT-OSS-20b-LoRA (Hallucination detection intrinsic) | 59.9 | 78.4  | 67.9 |
+We observe that both hallucination detection intrinsics (LoRA adapters) perform better not only than the corresponding base models prompted out of the box but also better than bigger models.
+## Training Details
+The hallucination detection intrinsic was trained on synthetically-generated datasets. The process of generating the training data consisted of two main steps:
+- _Multi-turn RAG conversation generation:_ Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpora. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500)
+- _Hallucination detection:_ For creating the faithfulness scores and explanations for the responses, we used a multi-step synthetic data generation pipeline.
+This process resulted in ~50K data instances, which were used to train the LoRA adapter.
+The resulting data instances were used to train the hallucination detection intrinsic.
+### Training Data
+The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
+- [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
+- [QuAC](https://huggingface.co/datasets/allenai/quac)
+### Adapter Details
+| Property | LoRA |
+|---|---|
+| **Base Model** | ibm-granite/granite-4.0-micro |
+| **PEFT Type** | LORA |
+| **Rank (r)** | 16 |
+| **Alpha** | 32 |
+| **Target Modules** | q_proj, k_proj, v_proj, o_proj |
+**Infrastructure:**
+We trained the hallucination detection granite-4.0-micro LoRA adapter on IBM's Vela cluster using 8 A100 GPUs.
+**Ethical Considerations & Limitations:**
+The model's outputs are not guaranteed to be factually accurate or complete. All outputs should be independently validated before use in decision-making or downstream applications. The model has been trained and evaluated on English data only.
+## Resources
+- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
+- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
+- 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite/granite-guardian/tree/main/cookbooks
+## Model Card Authors
 [Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)