pronics2004 commited on
Commit
aca70a8
·
1 Parent(s): c085b22

- Update context-attribution README with new model card (185ae0f2dc1bc6b1cb75cc74e58e9a9dde2d8bf8)

context-attribution/granite-4.0-micro/README.md CHANGED
@@ -4,7 +4,7 @@
4
 
5
  **Context Attribution** is a purpose-built LoRA built for [ibm-granite/granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro), to predict the sentences in the context that were most important for granite-4.0-micro to generate each sentence in its response or output. Here, context includes previous conversation turns as well as any documents provided to the granite-4-micro model. The context attribution LoRA thus helps to explain granite-4.0-micro's behavior, specifically how its probability of generating a certain response sentence is affected by different parts of the context.
6
 
7
- The context attribution model takes the form of a LoRA adapter (standard LoRA, always active). The adapter greatly improves granite-4.0-micro's ability to attribute to context (see Evaluation section below) while retaining other capabilities of the base model. The context attribution adaptor identifies which context sentences actually influenced a model’s response, while IBM Granite’s citation generation intrinsic highlights sentences that support the response regardless of whether the model used them.
8
 
9
  - **Developer:** IBM Research
10
  - **HF Collection:**
@@ -17,11 +17,11 @@ The context attribution model takes the form of a LoRA adapter (standard LoRA, a
17
 
18
  ## Usage
19
 
20
- **Intended use:** Granite 4.0 Micro Context Attribution is a context attribution adaptor for IBM's [granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro) LLM. It enables the base model to accurately identify which sentences in the context (including previous conversation turns and documents) were most important to the model when generating each sentence in its response. The intrinsic thus helps to explain the base model's behavior, specifically how its probability of generating a certain response sentence is affected by parts of the context. This intrinsic is designed to be used as part of the Granite inference pipeline. It is intended to be called after the base model generates a response to provide a post hoc explanation.
21
 
22
  **Contrast with Granite 4.0 Micro Citations:** The context attribution adapter is similar to IBM Granite’s citation generation intrinsic in that it uses a comparable input/output format and the same training data. However, the two differ in the type of attribution they provide. Citation generation provides corroborative attribution, identifying document sentences that best support a response regardless of whether the model relied on them, making it model-agnostic. In contrast, the context attribution adapter provides contributive attribution, identifying the context sentences that actually influenced a specific model’s response. It also includes prior conversation turns (not just documents) and ranks context sentences by importance rather than listing them unordered.
23
 
24
- The context attribution adaptor for Granite 4.0 Micro is designed specifically for this model and was trained on its behavior. While it may be applied to other LLMs, it has not been validated for them, and its attributions may not always align with human judgments of which context sentences should matter.
25
 
26
  **Input Format** The context attribution intrinsic expects the input to be processed as follows: 1) The last assistant response (the response to be attributed) should be split into sentences and the sentences numbered with tags as follows: `"<r0> sentence 0 <r1> sentence 1 ... "`. 2) The context (documents and previous conversation turns) should also be split into sentences and tagged as `"<c0> sentence 0 <c1> sentence 1 ... "`. The numbering of context sentences starts with the first document (if present) and continues as a single sequence through all the documents and then previous conversation turns, in that order. The ability to perform this input processing automatically will soon be available from IBM's [mellea](https://github.com/generative-computing/mellea) package (more specifically, it will be provided by the [granite-common](https://github.com/ibm-granite/granite-common) package that will be integrated into `mellea`). The Quickstart Example below shows approximately how this automated input processing will be called. Alternatively, the user can insert sentence tags manually or by other means (also shown in the Quickstart Example below).
27
 
@@ -119,13 +119,33 @@ print(output_text)
119
  # This output could be post-processed using `granite-common`, for example to display or highlight the sentence texts.
120
  ```
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  ## Training Details
123
 
124
  Granite 4.0 Micro Context Attribution is a LoRA adapter trained to approximate the importance ranking of context sentences provided by the MExGen method in [[Monteiro Paes and Wei et al., ACL 2025] Multi-Level Explanations for Generative Language Models](https://aclanthology.org/2025.acl-long.1553/).
125
 
126
- **Training Data:** The adaptor was trained on a mixture of two datasets, [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial) and [QUAC](https://huggingface.co/datasets/allenai/quac), which are both for document-grounded, multi-turn question answering. This mixture of datasets, which will be referred to as **MD2D-QUAC**, was also used to train IBM Granite's [citation generation intrinsic](https://huggingface.co/ibm-granite/granite-lib-rag-r1.0/blob/main/citations/README.md). 2053 question-answer conversation rounds were used for training and 1024 for validation. These numbers of instances were deliberately chosen to be modest to demonstrate that a well-performing intrinsic can be trained using this limited amount of data.
127
 
128
- The MultiDoc2Dial and QUAC datasets consist of sets of grounding documents and multi-turn question-answering conversations based on the set of documents. For training the context attribution intrinsic, each conversation round (consisting of one user question followed by an assistant response), together with any conversation rounds that precede it, was treated as a separate "instance". The context for each instance includes the documents for the conversation as well as all conversation rounds except the last round, which is treated as the current round for the instance. granite-4.0-micro was called to re-generate a response to the question in the current round (since the purpose of the intrinsic is to attribute granite-4.0-micro's responses). The leave-one-out (LOO) variant of the [MExGen method](https://aclanthology.org/2025.acl-long.1553/) was then used to attribute each response sentence to context sentences, yielding "gold" attribution scores. LOO is thus treated as the "skyline" method in the Evaluation section below. The LOO attribution scores were thresholded to produce a controlled-length list of context sentences, specifically a list of context sentence numbers in decreasing order of importance. The context attribution intrinsic was trained to reproduce these lists for all response sentences, given the context, response, and an instruction as input.
129
 
130
  ### Adapter Configurations
131
 
@@ -139,27 +159,9 @@ The MultiDoc2Dial and QUAC datasets consist of sets of grounding documents and m
139
  | Max completion tokens | 4096 |
140
  | KV cache | Supported |
141
 
142
- **Evaluation:** The context attribution intrinsic was evaluated on four datasets: MD2D-QUAC (a test subset not used in training), ELI5, CNN/Daily Mail (CNN/DM), and XSum. The first two are multi-turn QA datasets and the last two are summarization datasets. MD2D-QUAC evaluates the intrinsic on a test split of the dataset used for training, ELI5 evaluates generalization to a different dataset within the same task, and CNN/DM and XSum evaluate generalization to a different task.
143
-
144
- The evaluation metric is the area under the perturbation curve (AUPC), a standard metric for evaluating the faithfulness of a feature-attribution-like explanation to the model being explained. Please see the [MExGen paper](https://aclanthology.org/2025.acl-long.1553/) for more details on AUPC. Here, the perturbation curves are first multiplied by a linearly decreasing weight function, which decreases from 1 at 0% perturbation to 0 at 20% perturbation, yielding the weighted area under the perturbation curve (WAUPC).
145
-
146
- | method | MD2D-QUAC | ELI5 | CNN/DM | XSum | average |
147
- |:---------------------------|------------:|-------:|---------:|-------:|----------:|
148
- | LOO *(skyline)* | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
149
- | LOO thresholded *(realistic skyline)* | 0.9780 | 0.9812 | 0.9976 | 0.9983 | 0.9888 |
150
- | prompt 0-shot | 0.1476 | 0.1657 | 0.1629 | 0.2369 | 0.1783 |
151
- | prompt 1-shot | 0.1982 | 0.2171 | 0.1693 | 0.2014 | 0.1965 |
152
- | prompt GPT-OSS-120B 0-shot | 0.8619 | 0.8063 | 0.8735 | 0.6679 | 0.8024 |
153
- | prompt GPT-OSS-120B 1-shot | 0.8779 | 0.8498 | 0.8996 | 0.8642 | 0.8729 |
154
- | **context attribution LoRA** | **0.9360** | **0.9045** | **0.9158** | **0.9223** | **0.9197** |
155
-
156
- The first row of the table shows the WAUPC for the LOO variant of [MExGen](https://aclanthology.org/2025.acl-long.1553/), which the intrinsic aims to approximate. The second row corresponds to a thresholded version of LOO, which is the one that actually generates training data for the intrinsic. LOO and thresholded LOO thus represent "skyline" and "realistic skyline" methods that the intrinsic ideally would match. The WAUPC values in the table have been normalized by dividing by the WAUPC of LOO. As seen in the last row of the table, the intrinsic does well in approximating LOO, attaining at least 90% of its WAUPC on all datasets and 92.0% averaged across datasets.
157
-
158
- The table compares the context attribution intrinsic to baselines that prompt an LLM to perform the context attribution task. These include prompting the granite-4.0-micro base model (labelled simply as "prompt" in the table), with and without providing an example (0-shot/1-shot), and prompting GPT-OSS-120B, also with 0-shot or 1-shot. (2-shot and 3-shot prompting were also evaluated but did not improve upon 1-shot, so are omitted from this table for brevity.) It is clear that the context attribution intrinsic greatly improves granite-4.0-micro's ability to attribute its own responses to context. The intrinsic also outperforms GPT-OSS-120B, an LLM with around 40X the number of parameters (the LoRA adapter contributes a negligible number of parameters to the granite-4.0-micro base model).
159
-
160
- **Infrastructure:**
161
 
162
- **Ethical Consideration:**
163
 
164
  ## Resources
165
 
 
4
 
5
  **Context Attribution** is a purpose-built LoRA built for [ibm-granite/granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro), to predict the sentences in the context that were most important for granite-4.0-micro to generate each sentence in its response or output. Here, context includes previous conversation turns as well as any documents provided to the granite-4-micro model. The context attribution LoRA thus helps to explain granite-4.0-micro's behavior, specifically how its probability of generating a certain response sentence is affected by different parts of the context.
6
 
7
+ The context attribution model takes the form of a LoRA adapter (standard LoRA, always active). The adapter greatly improves granite-4.0-micro's ability to attribute to context (see Evaluation section below) while retaining other capabilities of the base model. The context attribution adapter identifies which context sentences actually influenced a model’s response, while IBM Granite’s citation generation intrinsic highlights sentences that support the response regardless of whether the model used them.
8
 
9
  - **Developer:** IBM Research
10
  - **HF Collection:**
 
17
 
18
  ## Usage
19
 
20
+ **Intended use:** Granite 4.0 Micro Context Attribution is a context attribution adapter for IBM's [granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro) LLM. It enables the base model to accurately identify which sentences in the context (including previous conversation turns and documents) were most important to the model when generating each sentence in its response. The intrinsic thus helps to explain the base model's behavior, specifically how its probability of generating a certain response sentence is affected by parts of the context. This intrinsic is designed to be used as part of the Granite inference pipeline. It is intended to be called after the base model generates a response to provide a post hoc explanation.
21
 
22
  **Contrast with Granite 4.0 Micro Citations:** The context attribution adapter is similar to IBM Granite’s citation generation intrinsic in that it uses a comparable input/output format and the same training data. However, the two differ in the type of attribution they provide. Citation generation provides corroborative attribution, identifying document sentences that best support a response regardless of whether the model relied on them, making it model-agnostic. In contrast, the context attribution adapter provides contributive attribution, identifying the context sentences that actually influenced a specific model’s response. It also includes prior conversation turns (not just documents) and ranks context sentences by importance rather than listing them unordered.
23
 
24
+ The context attribution adapter for Granite 4.0 Micro is designed specifically for this model and was trained on its behavior. While it may be applied to other LLMs, it has not been validated for them. In addition, its attributions may not always align with human judgments of which context sentences should matter.
25
 
26
  **Input Format** The context attribution intrinsic expects the input to be processed as follows: 1) The last assistant response (the response to be attributed) should be split into sentences and the sentences numbered with tags as follows: `"<r0> sentence 0 <r1> sentence 1 ... "`. 2) The context (documents and previous conversation turns) should also be split into sentences and tagged as `"<c0> sentence 0 <c1> sentence 1 ... "`. The numbering of context sentences starts with the first document (if present) and continues as a single sequence through all the documents and then previous conversation turns, in that order. The ability to perform this input processing automatically will soon be available from IBM's [mellea](https://github.com/generative-computing/mellea) package (more specifically, it will be provided by the [granite-common](https://github.com/ibm-granite/granite-common) package that will be integrated into `mellea`). The Quickstart Example below shows approximately how this automated input processing will be called. Alternatively, the user can insert sentence tags manually or by other means (also shown in the Quickstart Example below).
27
 
 
119
  # This output could be post-processed using `granite-common`, for example to display or highlight the sentence texts.
120
  ```
121
 
122
+ ## Evaluation
123
+
124
+ The context attribution intrinsic was evaluated on four datasets: MD2D-QUAC, ELI5, CNN/Daily Mail (CNN/DM), and XSum. The first two are document-grounded, multi-turn question answering datasets and the last two are summarization datasets. MD2D-QUAC evaluates the intrinsic on a held-out test split of the dataset used for training (please see "Training Data" below for more details), ELI5 evaluates generalization to a different dataset within the same task, and CNN/DM and XSum evaluate generalization to a different task.
125
+
126
+ The evaluation metric is the area under the perturbation curve (AUPC), a standard metric for evaluating the faithfulness of a feature-attribution-like explanation to the model being explained. Please see the [MExGen paper](https://aclanthology.org/2025.acl-long.1553/) for more details on AUPC. Here, the perturbation curves are first multiplied by a linearly decreasing weight function, which decreases from 1 at 0% perturbation to 0 at 20% perturbation, yielding the weighted area under the perturbation curve (WAUPC).
127
+
128
+ | method | MD2D-QUAC | ELI5 | CNN/DM | XSum | average |
129
+ |:---------------------------|------------:|-------:|---------:|-------:|----------:|
130
+ | LOO *(skyline)* | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
131
+ | LOO thresholded *(realistic skyline)* | 0.9780 | 0.9812 | 0.9976 | 0.9983 | 0.9888 |
132
+ | prompt 0-shot | 0.1476 | 0.1657 | 0.1629 | 0.2369 | 0.1783 |
133
+ | prompt 1-shot | 0.1982 | 0.2171 | 0.1693 | 0.2014 | 0.1965 |
134
+ | prompt GPT-OSS-120B 0-shot | 0.8619 | 0.8063 | 0.8735 | 0.6679 | 0.8024 |
135
+ | prompt GPT-OSS-120B 1-shot | 0.8779 | 0.8498 | 0.8996 | 0.8642 | 0.8729 |
136
+ | **context attribution LoRA** | **0.9360** | **0.9045** | **0.9158** | **0.9223** | **0.9197** |
137
+
138
+ The first row of the table shows the WAUPC for the leave-one-out (LOO) variant of the [MExGen method](https://aclanthology.org/2025.acl-long.1553/), which the intrinsic aims to approximate. The second row corresponds to a thresholded version of LOO, which is the one that actually generates training data for the intrinsic. LOO and thresholded LOO thus represent "skyline" and "realistic skyline" methods that the intrinsic ideally would match. The WAUPC values in the table have been normalized by dividing by the WAUPC of LOO. As seen in the last row of the table, the intrinsic does well in approximating LOO, attaining at least 90% of its WAUPC on all datasets and 92.0% averaged across datasets.
139
+
140
+ The table compares the context attribution intrinsic to baselines that prompt an LLM to perform the context attribution task. These include prompting the granite-4.0-micro base model (labelled simply as "prompt" in the table), with and without providing an example (0-shot/1-shot), and prompting GPT-OSS-120B, also with 0-shot or 1-shot. (2-shot and 3-shot prompting were also evaluated but did not improve upon 1-shot, so are omitted from this table for brevity.) It is clear that the context attribution intrinsic greatly improves granite-4.0-micro's ability to attribute its own responses to context. The intrinsic also outperforms GPT-OSS-120B, an LLM with around 40X the number of parameters (the LoRA adapter contributes a negligible number of parameters to the granite-4.0-micro base model).
141
+
142
  ## Training Details
143
 
144
  Granite 4.0 Micro Context Attribution is a LoRA adapter trained to approximate the importance ranking of context sentences provided by the MExGen method in [[Monteiro Paes and Wei et al., ACL 2025] Multi-Level Explanations for Generative Language Models](https://aclanthology.org/2025.acl-long.1553/).
145
 
146
+ **Training Data:** The adapter was trained on a mixture of two datasets, [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial) and [QUAC](https://huggingface.co/datasets/allenai/quac), which are both for document-grounded, multi-turn question answering. This mixture of datasets, referred to as **MD2D-QUAC**, was also used to train IBM Granite's [citation generation intrinsic](https://huggingface.co/ibm-granite/granite-lib-rag-r1.0/blob/main/citations/README.md). 2053 question-answer conversation rounds were used for training and 1024 for validation. These numbers of instances were deliberately chosen to be modest to demonstrate that a well-performing intrinsic can be trained using this limited amount of data.
147
 
148
+ The MultiDoc2Dial and QUAC datasets consist of sets of grounding documents and multi-turn question-answering conversations based on the set of documents. For training the context attribution intrinsic, each conversation round (consisting of one user question followed by an assistant response), together with any conversation rounds that precede it, was treated as a separate "instance". The context for each instance includes the documents for the conversation as well as all conversation rounds except the last round, which is treated as the current round for the instance. granite-4.0-micro was called to re-generate a response to the question in the current round (since the purpose of the intrinsic is to attribute granite-4.0-micro's responses). The leave-one-out (LOO) variant of [MExGen](https://aclanthology.org/2025.acl-long.1553/) was then used to attribute each response sentence to context sentences, yielding "gold" attribution scores. LOO is thus treated as the "skyline" method in the Evaluation section below. The LOO attribution scores were thresholded to produce a controlled-length list of context sentences, specifically a list of context sentence numbers in decreasing order of importance. The context attribution intrinsic was trained to reproduce these lists for all response sentences, given the context, response, and an instruction as input.
149
 
150
  ### Adapter Configurations
151
 
 
159
  | Max completion tokens | 4096 |
160
  | KV cache | Supported |
161
 
162
+ **Infrastructure:** The Granite 4.0 Micro Context Attribution LoRA adapter was trained on a single NVIDIA A100-80GB GPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
+ **Ethical Considerations:** The Context Attribution adapter for Granite 4.0 Micro was trained specifically to reflect the behavior of Granite 4.0 Micro. While it may be applied to other LLMs, it has not been validated for them. In addition, its attributions may not always align with human judgments of which context sentences should matter, as these may differ from what matters to Granite 4.0 Micro.
165
 
166
  ## Resources
167