Upload README.md for citations (#27)

Browse files

- Upload README.md for citations (2c0ff0068ca30464bcaf0ae7e088c58de24b6311)
- Update README.md for citations (55473a2fd912d8d60e18babcfc2cd4768110652a)

Co-authored-by: Yannis Katsis <[email protected]>

Files changed (1) hide show

citations/README.md +369 -0

citations/README.md ADDED Viewed

	@@ -0,0 +1,369 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: peft
+library_name: transformers
+---
+# Intrinsics for Citation Generation
+## Model Summary
+This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
+1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
+2. **Post-hoc citation generation:** Since the intrinsic takes the assistant response as input, it can generate citations for responses generated by any LLM. Pick your favorite LLM and use the intrinsic to generate post-hoc citations!
+We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
+</br>
+- **Developer:** IBM Research
+- **Model type:** LoRA adapter for [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) and [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Intended use
+This is a family of citation generation intrinsics that give the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. They can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
+> [!TIP]
+> Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
+**Intrinsic input**: The input to the citation generation intrinsic is an OpenAI-compatible chat completion request, containing a list of conversation turns ending with the assistant response for which the citations should be generated as well as the list of documents from which the citations should be drawn. Please see the code snippets in the Quickstart Example section below for examples on how to specify the chat completion request as a JSON object.
+**Intrinsic output**: The output of the citation generation intrinsic is formatted as the result of the original chat completion request containing the citations for the last assistant response. The citations are provided in the form of a JSON array, whose items include the text and begin/end of a response span together with the text, document id and begin/end of a document span that serves as a citation for the response span. When there are more than one document spans that serve as citations for a single response span, they are represented as separate objects in the JSON array.
+**Going from input to output**: When calling the intrinsic through granite-common one should follow the steps below to transform the intrinsic input to the corresponding output. These steps are also exemplified in the code snippets included in the Quickstart Example section below. Given an input chat completion request, the request should be  passed to the corresponding input processor (also referred to as IntrinsicsRewriter) provided by granite-common. The input processor converts the request to the appropriate format expected by the underlying citation generation model. This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction. The input processor's result should then be passed to the underlying citation generation model for inference. The model generates citations using a compact representation consisting of sentence IDs in the last assistant response and documents. This output should finally be passed to the appropriate output processor (also referred to as IntrinsicsResultProcessor) provided by granite-common. The output processor converts the low-level raw model output to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly format ready for consumption by downstream applications.
+## Quickstart Example
+To run the citation generation intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face Transformers library. We provide below instructions for each of the two approaches. Note that running inference using vLLM or another scalable OpenAI-compatible inference backend should be significantly faster than using the Hugging Face Transformers library directly.
+### Using an OpenAI-Compatible Inference Backend
+To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM, follow the steps below. We recommend using Python 3.11 or higher.
+1. Install the granite-common library:
+   ```
+   pip install granite-common[nltk]
+   ```
+2. Install the Hugging Face CLI:
+   ```
+   pip install -U "huggingface_hub[cli]"
+   ```
+3. Install vLLM:
+   ```
+   pip install vllm
+   ```
+4. Download the intrinsics library:
+   ```
+   hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
+   ```
+5. Edit the vLLM startup script found in `./rag-intrisics-lib/run_vllm.sh` using your favorite editor:
+   Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the base model on which the desired LoRA adapter has been trained. Optionally, edit the constant `PORT` to change the port on which vLLM will run. Save the modified file and exit the editor.
+6. Start vLLM through the startup script. The first time you run the script, you may have to change the permissions to allow execution:
+   ```
+   cd rag-intrinsics-lib
+   chmod u+x ./run_vllm.sh
+   ./run_vllm.sh &
+   ```
+7. Run the following code snippet:
+   ```
+   import json
+   import openai
+   import granite_common
+   intrinsic_name = "citations"
+   # Change the following constant to select a different base model
+   base_model_name = "granite-3.3-8b-instruct"
+   # Change the following constants as needed to reflect the location of the vLLM server
+   # The selected port should be identical to the one you specified in the vLLM startup script
+   openai_base_url = "http://localhost:55555/v1"
+   openai_api_key = "rag_intrinsics_1234"
+   # Fetch IO configuration file from Hugging Face Hub
+   io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+       intrinsic_name, base_model_name
+   )
+   # Instantiate input/output processors
+   rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+   result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+   # Sample request
+   request_json = {
+       "messages": [
+           {
+               "role": "user",
+               "content": "What is the visibility level of Git Repos and Issue Tracking projects?"
+           },
+           {
+               "role": "assistant",
+               "content": "Git Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. Private projects are visible only to project members, internal projects are visible to all users that are logged in to IBM Cloud, and public projects are visible to anyone. By default, new projects are set to private visibility level, which is the most secure for your data."
+           }
+       ],
+       "extra_body": {
+           "documents": [
+               {
+                   "doc_id": "0",
+                   "text": "Git Repos and Issue Tracking is an IBM-hosted component of the Continuous Delivery service. All of the data that you provide to Git Repos and Issue Tracking, including but not limited to source files, issues, pull requests, and project configuration properties, is managed securely within Continuous Delivery. However, Git Repos and Issue Tracking supports various mechanisms for exporting, sending, or otherwise sharing data to users and third parties. The ability of Git Repos and Issue Tracking to share information is typical of many social coding platforms. However, such sharing might conflict with regulatory controls that apply to your business. After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that you deem necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project membership \n\nGit Repos and Issue Tracking is a cloud hosted social coding environment that is available to all Continuous Delivery users. If you are a Git Repos and Issue Tracking project Maintainer or Owner, you can invite any user and group members to the project. IBM Cloud places no restrictions on who you can invite to a project."
+               },
+               {
+                   "doc_id": "1",
+                   "text": "After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that are necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project email settings \n\nBy default, Git Repos and Issue Tracking notifies project members by way of email about project activities. These emails typically include customer-owned data that was provided to Git Repos and Issue Tracking by users. For example, if a user posts a comment to an issue, Git Repos and Issue Tracking sends an email to all subscribers. The email includes information such as a copy of the comment, the user who posted it, and when the comment was posted. To turn off all email notifications for your project, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the **General Settings **page, click Visibility > project features > permissions. 3. Select the Disable email notifications checkbox. 4. Click Save changes. Project integrations and webhooks"
+               }
+           ]
+       }
+   }
+   # Add other parameters
+   request_json["model"] = intrinsic_name
+   request_json["temperature"] = 0.0
+   # Apply input processor
+   rewritten_request = rewriter.transform(request_json)
+   # Run inference
+   client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
+   chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
+   # Apply output processor
+   processed_chat_completion = result_processor.transform(
+       chat_completion, rewritten_request
+   )
+   # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+   parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
+   print("JSON output:")
+   print(json.dumps(parsed_contents, indent=2))
+   ```
+### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face Transformers library directly, follow the steps below. We recommend using Python 3.11 or higher.
+1. Install the granite-common library:
+   ```
+   pip install granite-common[nltk]
+   ```
+2. Install the Hugging Face CLI:
+   ```
+   pip install -U "huggingface_hub[cli]"
+   ```
+3. Install PEFT:
+   ```
+   pip install peft
+   ```
+4. Install xgrammar:
+   ```
+   pip install xgrammar
+   ```
+5. Run the following code snippet:
+   ```
+   import json
+   import granite_common.util
+   import peft
+   intrinsic_name = "citations"
+   # Change the following constant to select a different base model
+   base_model_name = "granite-3.3-8b-instruct"
+   use_cuda = True  # Set to False to use default PyTorch device for this machine + model
+   # Fetch IO configuration file from Hugging Face Hub
+   io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+       intrinsic_name, base_model_name
+   )
+   # Fetch LoRA directory from Hugging Face Hub
+   lora_dir = granite_common.intrinsics.util.obtain_lora(
+       intrinsic_name, base_model_name
+   )
+   # Instantiate input/output processors
+   rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+   result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+   # Sample request
+   request_json = {
+       "messages": [
+           {
+               "role": "user",
+               "content": "What is the visibility level of Git Repos and Issue Tracking projects?"
+           },
+           {
+               "role": "assistant",
+               "content": "Git Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. Private projects are visible only to project members, internal projects are visible to all users that are logged in to IBM Cloud, and public projects are visible to anyone. By default, new projects are set to private visibility level, which is the most secure for your data."
+           }
+       ],
+       "extra_body": {
+           "documents": [
+               {
+                   "doc_id": "0",
+                   "text": "Git Repos and Issue Tracking is an IBM-hosted component of the Continuous Delivery service. All of the data that you provide to Git Repos and Issue Tracking, including but not limited to source files, issues, pull requests, and project configuration properties, is managed securely within Continuous Delivery. However, Git Repos and Issue Tracking supports various mechanisms for exporting, sending, or otherwise sharing data to users and third parties. The ability of Git Repos and Issue Tracking to share information is typical of many social coding platforms. However, such sharing might conflict with regulatory controls that apply to your business. After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that you deem necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project membership \n\nGit Repos and Issue Tracking is a cloud hosted social coding environment that is available to all Continuous Delivery users. If you are a Git Repos and Issue Tracking project Maintainer or Owner, you can invite any user and group members to the project. IBM Cloud places no restrictions on who you can invite to a project."
+               },
+               {
+                   "doc_id": "1",
+                   "text": "After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that are necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project email settings \n\nBy default, Git Repos and Issue Tracking notifies project members by way of email about project activities. These emails typically include customer-owned data that was provided to Git Repos and Issue Tracking by users. For example, if a user posts a comment to an issue, Git Repos and Issue Tracking sends an email to all subscribers. The email includes information such as a copy of the comment, the user who posted it, and when the comment was posted. To turn off all email notifications for your project, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the **General Settings **page, click Visibility > project features > permissions. 3. Select the Disable email notifications checkbox. 4. Click Save changes. Project integrations and webhooks"
+               }
+           ]
+       }
+   }
+   # Add additional parameters
+   request_json["model"] = intrinsic_name
+   request_json["temperature"] = 0.0
+   # Apply input processor
+   rewritten_request = rewriter.transform(request_json)
+   # Load the base model and merge LoRA weights
+   model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
+   if use_cuda:
+       model = model.cuda()
+   # Convert the chat completion request into a the Transformers library's proprietary
+   # format.
+   generate_input, other_input = (
+       granite_common.util.chat_completion_request_to_transformers_inputs(
+           rewritten_request,
+           tokenizer,
+           model,
+       )
+   )
+   # Use the Transformers library's APIs to generate one or more completions,
+   # then convert those completions into OpenAI-compatible chat completion
+   responses = granite_common.util.generate_with_transformers(
+       tokenizer, model, generate_input, other_input
+   )
+   # Apply output processor
+   transformed_responses = result_processor.transform(responses, rewritten_request)
+   # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+   parsed_contents = json.loads(transformed_responses.choices[0].message.content)
+   print("JSON output:")
+   print(json.dumps(parsed_contents, indent=2))
+   ```
+## Training Details
+The citation generation intrinsics were trained on synthetically-generated citation datasets. The process of generating the training data consisted of two main steps:
+- **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpora. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
+- **Citation generation:** For each turn of the multi-turn RAG conversations from the previous step, we used a multi-step synthetic citation generation pipeline to generate citations for the assistant response.
+The resulting data instances were used to train the citation generation intrinsics.
+### Training Data
+The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
+- [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
+- [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
+- [QuAC](https://huggingface.co/datasets/allenai/quac)
+## Evaluation
+We evaluate the citation generation intrinsics on two citation benchmarks:
+- [ALCE](https://aclanthology.org/2023.emnlp-main.398/): Evaluates the ability of models to produce document/passage-level citations (i.e., identify the documents/passages that support a statement in the response).
+- [LongBench-Cite](https://arxiv.org/abs/2409.02897): Evaluates the ability of models to produce fine-grained span-level citations (i.e., identify the spans within the input documents/passages that support a statement in the response) with a focus on long contexts.
+Since the intrinsics correspond to a post-hoc citation generation approach, their performance on the two benchmarks depends on the assistant responses for which they are asked to generate citations. To facilitate an apples-to-apples comparison, for each experiment, we keep the assistant responses the same and change the model that is used to generate the citations. In particular, we prompt an LLM to create an assistant response together with citations and evaluate the generated citations on the corresponding benchmark. Then, we compute and evaluate the citations generated for the same LLM response by each of the citation generation intrinsics. We provide results for the two intrinsics, implemented as LoRA adapters over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
+### Evaluation on ALCE
+For the ALCE evaluation, we prompt Llama-3.1-70B-Instruct and Mixtral-8x22B-Instruct to generate both the assistant response and corresponding passage-level citations. We first calculate the performance of the citations generated by these models on ALCE. Subsequently, we feed the responses of these models (leaving out the citations) to the citation generation intrinsics and evaluate their generated citations. The results are shown in the table below:
+Model used to generate response | Model used to generate citations                      | Recall          | Precision         |  F1       |
+|--------------| ----------------------------- | --------------- | ----------------- | --------- |
+| Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct        | 61.4            | 58.1              | 59.7      |
+| Llama-3.1-70B-Instruct | Granite-3.3-2B LoRA citations | 51.5            | 64.2              | 57.2      |
+| Llama-3.1-70B-Instruct | Granite-3.3-8B LoRA citations | 55.4            | 64.2              | 59.5      |
+| Mixtral-8x22B-Instruct | Mixtral-8x22B-Instruct        | 62.2            | 62.5              | 62.3      |
+| Mixtral-8x22B-Instruct | Granite-3.3-2B LoRA citations | 51.4            | 67.3              | 58.3      |
+| Mixtral-8x22B-Instruct | Granite-3.3-8B LoRA citations | 55.8            | 68.5              | 61.5      |
+We observe that the LoRA adapter over Granite-3.3-8b-instruct performs on par with much bigger models when those are prompted to create passage-level citations (with the LoRA adapter over over Granite-3.3-2b-instruct being slightly worse). It is interesting to note that while the adapter's F1 performance is similar to the baselines, it exhibits a different precision-recall trade-off, trading lower recall for higher precision.
+Notes:
+- All results are reported on the ELI5 dataset using the ORACLE (5-psg) setting.
+- To prompt Llama and Mixtral, we employ a setting similar to the one proposed in the ALCE paper; in particular we use a two-shot prompt comprised of two of the ICL examples from ALCE as well as a slightly modified version of the instruction from the paper.
+- Sentence splitting of context/response is performed using NLTK.
+- Finally, since ALCE expects passage-level citations, we elevate the finer-grained citations produced by the LoRA adapter to the passage level before running the ALCE evaluation.
+### Evaluation on LongBench-Cite
+For the LonBench-Cite evaluation, we prompt Llama-3.1-70B-Instruct to generate both the assistant response and corresponding citations. Then we evaluate the citations generated by Llama as well as the post-hoc citations generated by the citation generation intrinsics when invoked on the Llama responses. The results are shown in the table below:
+<table>
+<tr>
+    <th>Model used to generate response</th>
+    <th>Model used to generate citations</th>
+    <th colspan="3">Longbench-Chat (en)</th>
+    <th colspan="3">MultifieldQA (en)</th>
+    <th colspan="3">HotpotQA</th>
+    <th colspan="3">GovReport</th>
+</tr>
+<tr>
+    <th></th>
+    <th></th>
+    <th>R</th><th>P</th><th>F1</th>
+    <th>R</th><th>P</th><th>F1</th>
+    <th>R</th><th>P</th><th>F1</th>
+    <th>R</th><th>P</th><th>F1</th>
+</tr>
+<tr>
+    <td>Llama-3.1-70B-Instruct</td>
+    <td>Llama-3.1-70B-Instruct</td>
+    <td>27.0</td><td>34.4</td><td>26.1</td>
+    <td>46.1</td><td>63.3</td><td>49.7</td>
+    <td>34.0</td><td>39.4</td><td>30.2</td>
+    <td>55.0</td><td>77.5</td><td>62.0</td>
+</tr>
+<tr>
+    <td>Llama-3.1-70B-Instruct</td>
+    <td>Granite-3.3-2B LoRA citations</td>
+    <td>38.7</td><td>47.4</td><td>39.3</td>
+    <td>66.4</td><td>81.8</td><td>70.4</td>
+    <td>60.7</td><td>68.5</td><td>59.7</td>
+    <td>60.1</td><td>72.4</td><td>64.7</td>
+</tr>
+<tr>
+    <td>Llama-3.1-70B-Instruct</td>
+    <td>Granite-3.3-8B LoRA citations</td>
+    <td>54.5</td><td>59.9</td><td>55.6</td>
+    <td>73.0</td><td>82.9</td><td>75.7</td>
+    <td>68.5</td><td>73.8</td><td>66.4</td>
+    <td>73.5</td><td>84.6</td><td>78.2</td>
+</tr>
+</table>
+We observe that both variants of the LoRA adapter (even the one trained over Granite-3.3-2b-instruct) perform across the board significantly better than Llama-3.1-70B-Instruct when prompted to create span-level citations. This demonstrates the value of the adapter to create post-hoc citations even for assistant responses generated by much bigger LLMs.
+Notes:
+- The evaluation results are reported on the English subset of LongBench-Cite (i.e., restricted to instances whose `language` field equals to `en`).
+- To prompt Llama to generate a response with citations, we use the one-shot prompt described in the paper.
+- For the LoRA adapter, sentence splitting of the context is performed using NLTK. For the response, we reuse the splitting in Llama's output (since the LongBench-Cite prompt instructs the model to output a response split into sentences/statements).
+## Model Card Authors
+[Yannis Katsis](mailto:[email protected])</br>
+[Chulaka Gunasekara](mailto:[email protected])