yasu-oh
commited on
Commit
·
4349a89
1
Parent(s):
015bb0b
README
Browse files
README.md
CHANGED
|
@@ -19,39 +19,46 @@ base_model_relation: merge
|
|
| 19 |
|
| 20 |
## Overview
|
| 21 |
|
| 22 |
-
**Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
## Merge Methodology
|
| 27 |
|
| 28 |
This model was created using a weighted linear merge:
|
| 29 |
-
|
| 30 |
```
|
| 31 |
-
Llama-3-Swallow-Infused-R1776-70B =
|
| 32 |
r1-1776-distill-llama-70b + 0.4 * (
|
| 33 |
Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
|
| 34 |
)
|
| 35 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
* **Delta**: `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` - `meta-llama/Llama-3.3-70B-Instruct`
|
| 39 |
-
* **Merge Tool**: Performed using [MergeKit](https://github.com/arcee-ai/mergekit)
|
| 40 |
-
* **Scaling Factor**: `α = 0.4`
|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
## Languages
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
## Recommended
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
|
| 56 |
## License
|
| 57 |
|
|
@@ -59,17 +66,17 @@ This model is distributed under the **Meta Llama 3 Community License**.
|
|
| 59 |
Please review and comply with its terms:
|
| 60 |
[https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
|
| 61 |
|
| 62 |
-
**Key Restrictions Include
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
|
| 69 |
## Citations
|
| 70 |
|
| 71 |
If you use this model, please cite the original works:
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
|
|
|
| 19 |
|
| 20 |
## Overview
|
| 21 |
|
| 22 |
+
**Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model built on Meta's **Llama 3** architecture. This model combines the distilled reasoning performance of `r1-1776-distill-llama-70b` with enhanced instruction-following capabilities from the `Swallow` model, making it particularly effective for both English and Japanese instruction tasks.
|
| 23 |
|
| 24 |
+
The foundation of this model leverages `perplexity-ai/r1-1776-distill-llama-70b`, a distilled model fine-tuned for reasoning tasks on top of Llama 3.3. To boost Japanese language proficiency and overall instruction alignment, we incorporated the ChatVector from `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4`. This approach - **adding an instruction-tuned model’s ChatVector to a reasoning-centric model** - represents an innovative strategy to enhance the model's multilingual reasoning capabilities.
|
| 25 |
|
| 26 |
## Merge Methodology
|
| 27 |
|
| 28 |
This model was created using a weighted linear merge:
|
|
|
|
| 29 |
```
|
| 30 |
+
Llama-3-Swallow-Infused-R1776-70B =
|
| 31 |
r1-1776-distill-llama-70b + 0.4 * (
|
| 32 |
Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
|
| 33 |
)
|
| 34 |
```
|
| 35 |
+
- **Base**: `perplexity-ai/r1-1776-distill-llama-70b`
|
| 36 |
+
- A distilled reasoning-focused model built on Meta Llama 3.3.
|
| 37 |
+
- **Delta**: Difference between `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` and `meta-llama/Llama-3.3-70B-Instruct`.
|
| 38 |
+
- **Merge Tool**: [MergeKit](https://github.com/arcee-ai/mergekit)
|
| 39 |
+
- **Scaling Factor**: `α = 0.4`
|
| 40 |
|
| 41 |
+
Before merging, we performed vocabulary alignment to ensure consistency between the merged components. This step uses [yasu-oh/merge_tools](https://github.com/yasu-oh/merge_tools) to align the vocabulary of the added model with the tokenizer of the base model. This preprocessing step prevents token mismatches and preserves high-quality performance across merged models.
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
This methodology ensures that the reasoning backbone of R1776 is retained while integrating Swallow's enhancements in instruction tuning and Japanese language support.
|
| 44 |
|
| 45 |
## Languages
|
| 46 |
|
| 47 |
+
- English
|
| 48 |
+
- Japanese
|
| 49 |
+
|
| 50 |
+
## Key Features
|
| 51 |
+
|
| 52 |
+
- Bilingual support: robust performance for both English and Japanese tasks.
|
| 53 |
+
- Enhanced reasoning and instruction-following capabilities.
|
| 54 |
+
- Novel use of ChatVector addition from instruction-tuned models to a reasoning-centric base.
|
| 55 |
|
| 56 |
+
## Recommended Parameters
|
| 57 |
|
| 58 |
+
- `temperature`: 0.6
|
| 59 |
+
- `top_p`: 0.95
|
| 60 |
+
- `top_k`: 40
|
| 61 |
+
- `min_p`: 0.0
|
| 62 |
|
| 63 |
## License
|
| 64 |
|
|
|
|
| 66 |
Please review and comply with its terms:
|
| 67 |
[https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
|
| 68 |
|
| 69 |
+
**Key Restrictions Include**:
|
| 70 |
|
| 71 |
+
- Do not use this model to improve competing large language models (LLMs).
|
| 72 |
+
- When reusing this model, include the phrase: **"Built with Meta Llama 3."**
|
| 73 |
+
- Organizations with more than **700 million monthly active users (MAU)** require a separate license from Meta.
|
| 74 |
+
- Model names must include “Llama 3”.
|
| 75 |
|
| 76 |
## Citations
|
| 77 |
|
| 78 |
If you use this model, please cite the original works:
|
| 79 |
|
| 80 |
+
- [perplexity-ai/r1-1776-distill-llama-70b](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b)
|
| 81 |
+
- [tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4](https://huggingface.co/tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4)
|
| 82 |
+
- [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
|