PatronusAI
/

Llama-3-Patronus-Lynx-8B-Instruct

Text Generation

hallucination-detection

text-generation-inference

Model card Files Files and versions

sunitha-ravi commited on Jul 22, 2024

Commit

5523f86

·

verified ·

1 Parent(s): d2e0b01

Update README.md

Files changed (1) hide show

README.md +27 -17

README.md CHANGED Viewed

@@ -68,28 +68,23 @@ The model will output the score as 'PASS' if the answer is faithful to the docum
 To run inference, you can use HF pipeline:
 ```
-import transformers
-model_id = "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct"
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model_id,
-    max_new_tokens=600,
-    device="cuda",
-    eturn_full_text=False
-)
 messages = [
     {"role": "user", "content": prompt},
 ]
-outputs = pipeline(
-    messages,
-    temperature=0
-)
-print(outputs[0]["generated_text"])
 ```
 Since the model is trained in chat format, ensure that you pass the prompt as a user message.
@@ -100,7 +95,21 @@ For more information on training details, refer to our [ArXiv paper](https://arx
 The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).
-It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude-3-Sonnet.
 ## Citation
 If you are using the model, cite using
@@ -116,4 +125,5 @@ If you are using the model, cite using
 ## Model Card Contact
 [@sunitha-ravi](https://huggingface.co/sunitha-ravi)
-[@RebeccaQian1](https://huggingface.co/RebeccaQian1)

 To run inference, you can use HF pipeline:
 ```
+model_name = 'PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct'
+pipe = pipeline(
+          "text-generation",
+          model=model_name,
+          max_new_tokens=600,
+          device="cuda",
+          return_full_text=False
+        )
 messages = [
     {"role": "user", "content": prompt},
 ]
+result = pipe(messages)
+print(result[0]['generated_text'])
 ```
 Since the model is trained in chat format, ensure that you pass the prompt as a user message.
 The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).
+| Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| GPT-4o | 87.9% | 84.3% | **85.3%** | 84.3% | 95.0% | 82.1% | 86.5% |
+| GPT-4-Turbo | 86.0% | **85.0%** | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
+| GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
+| Claude-3-Sonnet | 84.5% | 79.1% | 69.7% | 84.3% | 95.0% | 82.9% | 78.8% |
+| Claude-3-Haiku | 68.9% | 78.9% | 58.4% | 84.3% | 95.0% | 82.9% | 69.0% |
+| RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
+| Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
+| Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
+| Llama-3-Instruct-70B | 87.0% | 83.8% | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
+| LYNX (8B) | 85.7% | 80.0% | 72.5% | 77.8% | 96.3% | 85.2% | 82.9% |
+| LYNX (70B) | **88.4%** | 80.2% | 81.4% | **86.4%** | **97.5%** | **90.4%** | **87.4%** |
 ## Citation
 If you are using the model, cite using
 ## Model Card Contact
 [@sunitha-ravi](https://huggingface.co/sunitha-ravi)
+[@RebeccaQian1](https://huggingface.co/RebeccaQian1)
+[@presidev](https://huggingface.co/presidev)