Add pipeline tag and GitHub link
#5
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,20 +1,22 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
license: other
|
| 4 |
license_name: tongyi-qianwen
|
| 5 |
-
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
-
](https://featherless.ai/models/featherless-ai/QRWKV-72B)
|
| 11 |
- Model details from our blog post here! [](https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large)
|
| 12 |
- This model was presented in [RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale](https://huggingface.co/papers/2505.03005).
|
|
|
|
| 13 |
|
| 14 |
Benchmarks is as follows for both QRWKV-QwQ-32B and QRWKV-72B models:
|
| 15 |
|
| 16 |
| Tasks | Metric | QRWKV-QwQ-32B | Qwen/QwQ-32B | QRWKV-72B | Qwen2.5-72B-Instruct |
|
| 17 |
-
|
| 18 |
| arc_challenge | acc_norm | **0.5640** | 0.5563 | **0.6382** | 0.6323 |
|
| 19 |
| arc_easy | acc_norm | 0.7837 | **0.7866** | **0.8443** | 0.8329 |
|
| 20 |
| hellaswag | acc_norm | 0.8303 | **0.8407** | 0.8573 | **0.8736** |
|
|
@@ -83,4 +85,4 @@ As demonstrated with our QRWKV-72B-Preview and prior models such as QRWKV6-32B I
|
|
| 83 |
|
| 84 |
As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
|
| 85 |
|
| 86 |
-
You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
license: other
|
| 4 |
license_name: tongyi-qianwen
|
| 5 |
+
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/e41oJOWxEvcZYiXstfEH_.png
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
---
|
| 8 |
|
| 9 |
+

|
| 10 |
|
| 11 |
- Try out the model on [](https://featherless.ai/models/featherless-ai/QRWKV-72B)
|
| 12 |
- Model details from our blog post here! [](https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large)
|
| 13 |
- This model was presented in [RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale](https://huggingface.co/papers/2505.03005).
|
| 14 |
+
- Code: [https://github.com/recursal/RADLADS](https://github.com/recursal/RADLADS)
|
| 15 |
|
| 16 |
Benchmarks is as follows for both QRWKV-QwQ-32B and QRWKV-72B models:
|
| 17 |
|
| 18 |
| Tasks | Metric | QRWKV-QwQ-32B | Qwen/QwQ-32B | QRWKV-72B | Qwen2.5-72B-Instruct |
|
| 19 |
+
|:---:|:---:|:---:|:---:|:---:|:---:|\
|
| 20 |
| arc_challenge | acc_norm | **0.5640** | 0.5563 | **0.6382** | 0.6323 |
|
| 21 |
| arc_easy | acc_norm | 0.7837 | **0.7866** | **0.8443** | 0.8329 |
|
| 22 |
| hellaswag | acc_norm | 0.8303 | **0.8407** | 0.8573 | **0.8736** |
|
|
|
|
| 85 |
|
| 86 |
As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
|
| 87 |
|
| 88 |
+
You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
|