DebateLabKIT
/

Llama-3.1-Argunaut-1-8B-HIRPO

@@ -1,17 +1,32 @@
 ---
 library_name: transformers
-model_name: Llama-3.1-Argunaut-1-8B-HIRPO-dev4
 tags:
 - generated_from_trainer
 - trl
-- dpo
-licence: license
 ---
-# Model Card for Llama-3.1-Argunaut-1-8B-HIRPO-dev4
-This model is a fine-tuned version of [None](https://huggingface.co/None).
-It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
@@ -19,17 +34,17 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO-dev4", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ggbetz/argunauts-training/runs/8afyfsbz)
-This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 ### Framework versions
@@ -39,20 +54,146 @@ This model was trained with DPO, a method introduced in [Direct Preference Optim
 - Datasets: 3.1.0
 - Tokenizers: 0.20.3
 ## Citations
-Cite DPO as:
 ```bibtex
-@inproceedings{rafailov2023direct,
-    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
-    author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
-    year         = 2023,
-    booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
-    url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
-    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
-}
-```
 Cite TRL as:

 ---
 library_name: transformers
+model_name: Llama-3.1-Argunaut-1-8B-HIRPO
+pipeline_tag: text-generation
+base_model: DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN
+datasets:
+- DebateLabKIT/arguments-and-debates
 tags:
+- logic
+- argumentation
+- critical-thinking
+- argument-mapping
 - generated_from_trainer
 - trl
+- rlvr
+- hirpo
+licence: llama3.1
 ---
+# Model Card for Llama-3.1-Argunaut-1-8B-HIRPO
+This model is a fine-tuned version of [DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN](https://huggingface.co/DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN).
+It has been trained using [TRL](https://github.com/huggingface/trl) and [vLLM](https://docs.vllm.ai/). Checkpoints are tagged.
+📘 [HF Blog Article](https://huggingface.co/blog/ggbetz/argunauts-phase-3)
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://api.wandb.ai/links/ggbetz/47t0e7i8)
 ## Quick start
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
+This model was trained with Hindsight Instruction Relabeling Preference Optimization (HIRPO), an Online DPO version derived from [The Wisdom of Hindsight Makes Language Models Better Instruction Followers](https://huggingface.co/papers/2302.05206).
+More details about the training procedure can be found in the [blog post](https://huggingface.co/blog/ggbetz/argunauts-phase-3).
 ### Framework versions
 - Datasets: 3.1.0
 - Tokenizers: 0.20.3
+## Evaluation
+### Chat Experience
+<details>
+<code>
+👤 **You**
+Hi, are you familiar with Argdown?
+🤖 **Assistant**
+Yes, I'm familiar with Argdown, a markup language for logical and critical thinking, particularly suited for representing and analyzing arguments. It allows for the clear and structured representation of arguments, including claims and arguments, as well as inference trees. If you have an argument you'd like me to analyze or need assistance with a critical thinking problem, feel free to provide the details!
+👤 **You**
+Can you please Argdown argument map that visualizes the argumentative macro structure of the following text?
+:::
+School uniforms deter crime and increase student safety.
+In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%. [64]
+One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms. [25][69]
+School uniforms also prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible. Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.” [6][38]
+Further, school uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated. Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.” Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.” [5][91]
+:::
+🤖 **Assistant**
+The provided text does not contain an argument with clearly defined premises and a conclusion. It presents some claims and supports them with evidence, but it does not explicitly state an argument with a clear structure of premises and a conclusion. Therefore, there is no argument to map.
+If you provide the full argument, including the premises and the conclusion, I can assist you in creating an Argdown argument map. Please share the complete argument.
+👤 **You**
+That is correct. But I'm asking you to map these support and attack relations using Argdown Syntax in an argdown snippet.
+🤖 **Assistant**
+Given the information provided, it appears there's an implicit argument structure. However, without explicit premises and a conclusion, it's challenging to create a clear Argdown map. Nonetheless, let's represent the support and attack relations as described:
+```argdown
+[Conclusion~1]: School uniforms deter crime and increase student safety.
+    <+ <Subargument 1>：
+        <+ [Conclusion~2]: Implementing school uniforms reduces bullying based on appearance or economic background.
+            <+ <Subargument 2>：
+                <+ [Premise]: 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
+        <+ [Premise]: Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
+        <+ [Premise]: A sense of belonging and social acceptance among students is a desirable outcome.
+```
+👤 **You**
+Please summarize, in your own words, the main argument presented in the text. No more than 2-3 sentemces, please.
+🤖 **Assistant**
+The argument's key point is: School uniforms deter crime and increase student safety.
+👤 **You**
+Let's focus on this argument. Just for the protocol, what's the conclusion?
+🤖 **Assistant**
+The text doesn't explicitly state a conclusion. It provides supporting evidence and quotes from various sources. If I had to infer a conclusion based on the text, it would be:
+School uniforms deter crime and increase student safety.
+👤 **You**
+Now, can you please reconstruct this argument as premise conclusion structure (standard form), using argdown syntax? Please give the argument a succinct title before starting with the premise conclusion structure.
+🤖 **Assistant**
+The argument can be reconstructed as follows:
+* The argument does not explicitly state that the conclusion follows from the premises, so I will not add an inference figure.
+```argdown
+(1) 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
+(2) Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
+(3) A sense of belonging and social acceptance among students is a desirable outcome.
+--
+School uniforms deter crime and increase student safety.
+```
+👤 **You**
+Hm, please explain: How do the three premises collectively justify the conclusion?
+🤖 **Assistant**
+The premises collectively justify the conclusion by establishing a connection between school uniforms, bullying, and student well-being. Specifically, the first premise indicates that a significant majority of teachers support the idea that school uniforms can reduce bullying, thereby creating a safer and more harmonious school environment. The second premise suggests that reducing bullying leads to a positive social dynamic among students, where they feel a sense of belonging and acceptance. The third premise reinforces the value of this social dynamic by affirming that it is a desirable outcome. Putting these premises together, one can infer that school uniforms, which can reduce bullying, contribute to creating a more positive and safer school environment, thereby supporting the conclusion that school uniforms deter crime and increase student safety.
+👤 **You**
+Can you add a connecting premise that makes the link between a positive environment on the one side and safety / crime deterrence on the other explicit?
+🤖 **Assistant**
+Let me adjust the argdown code accordingly.
+```argdown
+(1) 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
+(2) Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
+(3) A sense of belonging and social acceptance among students is a desirable outcome.
+(4) A positive school environment contributes to a safer school environment.
+--------------------------
+(5) School uniforms deter crime and increase student safety.
+```
+</code>
+</details>
+### Metrics
+*coming soon*
 ## Citations
+Cite HIR as:
 ```bibtex
+@misc{zhang2023wisdomhindsightmakeslanguage,
+      title={The Wisdom of Hindsight Makes Language Models Better Instruction Followers},
+      author={Tianjun Zhang and Fangchen Liu and Justin Wong and Pieter Abbeel and Joseph E. Gonzalez},
+      year={2023},
+      eprint={2302.05206},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2302.05206},
+}```
 Cite TRL as: