Update README.md
Browse files
README.md
CHANGED
|
@@ -22,6 +22,9 @@ providing access to the Uncertainty, Hallucination Detection, and Safety Excepti
|
|
| 22 |
- **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
|
| 23 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 24 |
|
|
|
|
|
|
|
|
|
|
| 25 |
### Uncertainty Intrinsic
|
| 26 |
The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
|
| 27 |
|
|
@@ -33,7 +36,8 @@ This percentage is *calibrated* in the following sense: given a set of answers a
|
|
| 33 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
| 34 |
|
| 35 |
### Safety Exception Intrinsic
|
| 36 |
-
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
|
|
|
|
| 37 |
|
| 38 |
|
| 39 |
## Usage
|
|
@@ -83,8 +87,8 @@ we can evaluate the certainty and hallucination status of this reply by invoking
|
|
| 83 |

|
| 84 |
|
| 85 |
|
| 86 |
-
### PDL
|
| 87 |
-
Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1
|
| 88 |
```python
|
| 89 |
defs:
|
| 90 |
apply_template:
|
|
@@ -111,10 +115,10 @@ defs:
|
|
| 111 |
def: mycontext
|
| 112 |
args:
|
| 113 |
context: ${ context }
|
| 114 |
-
- model: granite-8b-intrinsics-
|
| 115 |
parameters:
|
| 116 |
api_key: EMPTY
|
| 117 |
-
api_base:
|
| 118 |
temperature: 0
|
| 119 |
max_tokens: 1
|
| 120 |
custom_llm_provider: text-completion-openai
|
|
@@ -157,13 +161,13 @@ text:
|
|
| 157 |
intrinsic: safety
|
| 158 |
- role: system
|
| 159 |
text: ${ system_prompt }
|
| 160 |
-
- if: ${ safety != "
|
| 161 |
then:
|
| 162 |
text:
|
| 163 |
- "\n\nDocuments: ${ document }\n\n ${ query }"
|
| 164 |
-
- model: openai/granite-8b-intrinsics-
|
| 165 |
def: answer
|
| 166 |
-
parameters: {api_key: EMPTY, api_base:
|
| 167 |
- call: get_intrinsic
|
| 168 |
def: certainty
|
| 169 |
contribute: []
|
|
@@ -179,7 +183,8 @@ text:
|
|
| 179 |
- "\nCertainty: ${ certainty }"
|
| 180 |
- "\nHallucination: ${ hallucination }"
|
| 181 |
```
|
| 182 |
-
|
|
|
|
| 183 |
|
| 184 |
|
| 185 |
|
|
@@ -197,16 +202,19 @@ Additionally, certainty scores are *distributional* quantities, and so will do w
|
|
| 197 |
red-teamed examples.
|
| 198 |
|
| 199 |
## Evaluation
|
|
|
|
| 200 |
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
|
| 205 |
|
|
|
|
| 206 |
|
| 207 |
-
|
| 208 |
|
| 209 |
|
|
|
|
| 210 |
|
| 211 |
## Training Details
|
| 212 |
The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
|
@@ -238,13 +246,27 @@ The following datasets were used for calibration and/or finetuning. Certainty sc
|
|
| 238 |
* [piqa](https://huggingface.co/datasets/ybisk/piqa)
|
| 239 |
|
| 240 |
### RAG Hallucination Training Data
|
| 241 |
-
The following public datasets were used for finetuning
|
| 242 |
For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
|
| 243 |
|
| 244 |
-
|
| 245 |
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
| 246 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
| 247 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
## Model Card Authors
|
| 249 |
|
| 250 |
Kristjan Greenewald
|
|
|
|
| 22 |
- **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
|
| 23 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 24 |
|
| 25 |
+
|
| 26 |
+

|
| 27 |
+
|
| 28 |
### Uncertainty Intrinsic
|
| 29 |
The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
|
| 30 |
|
|
|
|
| 36 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
| 37 |
|
| 38 |
### Safety Exception Intrinsic
|
| 39 |
+
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
|
| 40 |
+
The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
|
| 41 |
|
| 42 |
|
| 43 |
## Usage
|
|
|
|
| 87 |

|
| 88 |
|
| 89 |
|
| 90 |
+
### Intrinsics Example with PDL
|
| 91 |
+
Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
|
| 92 |
```python
|
| 93 |
defs:
|
| 94 |
apply_template:
|
|
|
|
| 115 |
def: mycontext
|
| 116 |
args:
|
| 117 |
context: ${ context }
|
| 118 |
+
- model: granite-8b-intrinsics-v1
|
| 119 |
parameters:
|
| 120 |
api_key: EMPTY
|
| 121 |
+
api_base: API_BASE
|
| 122 |
temperature: 0
|
| 123 |
max_tokens: 1
|
| 124 |
custom_llm_provider: text-completion-openai
|
|
|
|
| 161 |
intrinsic: safety
|
| 162 |
- role: system
|
| 163 |
text: ${ system_prompt }
|
| 164 |
+
- if: ${ safety != "Y" }
|
| 165 |
then:
|
| 166 |
text:
|
| 167 |
- "\n\nDocuments: ${ document }\n\n ${ query }"
|
| 168 |
+
- model: openai/granite-8b-intrinsics-v1
|
| 169 |
def: answer
|
| 170 |
+
parameters: {api_key: EMPTY, api_base: API_BASE, temperature: 0, stop: "\n"}
|
| 171 |
- call: get_intrinsic
|
| 172 |
def: certainty
|
| 173 |
contribute: []
|
|
|
|
| 183 |
- "\nCertainty: ${ certainty }"
|
| 184 |
- "\nHallucination: ${ hallucination }"
|
| 185 |
```
|
| 186 |
+
### Intrinsics Example with SGLang
|
| 187 |
+
The below SGLang implementation uses the SGLang fork at [https://github.com/frreiss/sglang/tree/granite](https://github.com/frreiss/sglang/tree/granite) that supports Granite models.
|
| 188 |
|
| 189 |
|
| 190 |
|
|
|
|
| 202 |
red-teamed examples.
|
| 203 |
|
| 204 |
## Evaluation
|
| 205 |
+
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
| 206 |
|
| 207 |
+
We first find that the performance of the intrinsics in our shared model **Granite Instrinsics 3.0 8b v1** is not degraded
|
| 208 |
+
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
|
| 209 |
+
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
|
| 210 |
|
| 211 |
|
| 212 |
+

|
| 213 |
|
| 214 |
+
We then find that RAG performance of **Granite Instrinsics 3.0 8b v1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
|
| 215 |
|
| 216 |
|
| 217 |
+

|
| 218 |
|
| 219 |
## Training Details
|
| 220 |
The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
|
|
|
| 246 |
* [piqa](https://huggingface.co/datasets/ybisk/piqa)
|
| 247 |
|
| 248 |
### RAG Hallucination Training Data
|
| 249 |
+
The following public datasets were used for finetuning. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
|
| 250 |
For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
|
| 251 |
|
|
|
|
| 252 |
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
| 253 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
| 254 |
|
| 255 |
+
### Safety Exception Training Data
|
| 256 |
+
The following public datasets were used for finetuning.
|
| 257 |
+
|
| 258 |
+
* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
|
| 259 |
+
* [nvidia/Aegis-AI-Content-Safety-Dataset-1.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-1.0/viewer/default/train)
|
| 260 |
+
* A subset of [https://huggingface.co/datasets/Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
| 261 |
+
* Ibm/AttaQ
|
| 262 |
+
* [google/civil_comments](https://huggingface.co/datasets/google/civil_comments/blob/5cb696158f7a49c75722fd0c16abded746da3ea3/civil_comments.py)
|
| 263 |
+
* [allenai/social_bias_frames](https://huggingface.co/datasets/allenai/social_bias_frames)
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
|
| 268 |
+
|
| 269 |
+
|
| 270 |
## Model Card Authors
|
| 271 |
|
| 272 |
Kristjan Greenewald
|