Update README.md
Browse files
README.md
CHANGED
|
@@ -15,6 +15,51 @@ tags:
|
|
| 15 |
This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
| 16 |
Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## Use with llama.cpp
|
| 19 |
Install llama.cpp through brew (works on Mac and Linux)
|
| 20 |
|
|
|
|
| 15 |
This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
| 16 |
Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
|
| 17 |
|
| 18 |
+
---
|
| 19 |
+
RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
RpR models use the same curated, deduplicated RP and creative writing
|
| 23 |
+
dataset used for RPMax, with a focus on variety to ensure high
|
| 24 |
+
creativity and minimize cross-context repetition. Users familiar with
|
| 25 |
+
RPMax will recognize the unique, non-repetitive writing style unlike
|
| 26 |
+
other finetuned-for-RP models.
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
With the release of QwQ as the first high performing open-source
|
| 30 |
+
reasoning model that can be easily trained, it was clear that the
|
| 31 |
+
available instruct and creative writing reasoning datasets contains only
|
| 32 |
+
one response per example. This is type of single response dataset used
|
| 33 |
+
for training reasoning models causes degraded output quality in long
|
| 34 |
+
multi-turn chats. Which is why Arli AI decided to create a real RP model
|
| 35 |
+
capable of long multi-turn chat with reasoning.
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
In order to create RpR, we first had to actually create the reasoning
|
| 39 |
+
RP dataset by re-processing our existing known-good RPMax dataset into a
|
| 40 |
+
reasoning dataset. This was possible by using the base QwQ Instruct
|
| 41 |
+
model itself to create the reasoning process for every turn in the RPMax
|
| 42 |
+
dataset conversation examples, which is then further refined in order
|
| 43 |
+
to make sure the reasoning is in-line with the actual response examples
|
| 44 |
+
from the dataset.
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
Another important thing to get right is to make sure the model is
|
| 48 |
+
trained on examples that present reasoning blocks in the same way as it
|
| 49 |
+
encounters it during inference. Which is, never seeing the reasoning
|
| 50 |
+
blocks in it's context. In order to do this, the training run was
|
| 51 |
+
completed using axolotl with manual template-free segments dataset in
|
| 52 |
+
order to make sure that the model is never trained to see the reasoning
|
| 53 |
+
block in the context. Just like how the model will be used during
|
| 54 |
+
inference time.
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
The result of training on this dataset with this method are
|
| 58 |
+
consistently coherent and interesting outputs even in long multi-turn RP
|
| 59 |
+
chats. This is as far as we know the first true correctly-trained
|
| 60 |
+
reasoning model trained for RP and creative writing.
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
## Use with llama.cpp
|
| 64 |
Install llama.cpp through brew (works on Mac and Linux)
|
| 65 |
|