yasu-oh commited on
Commit
4349a89
·
1 Parent(s): 015bb0b
Files changed (1) hide show
  1. README.md +31 -24
README.md CHANGED
@@ -19,39 +19,46 @@ base_model_relation: merge
19
 
20
  ## Overview
21
 
22
- **Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model based on Metas **Llama 3** architecture. It combines the distilled instruction-following behavior of `r1-1776` with enhancements derived from the `Swallow` delta over Meta's base Llama 3.3 model.
23
 
24
- This composition is particularly suited for English and Japanese instruction tasks, maintaining robustness while introducing sharper alignment capabilities.
25
 
26
  ## Merge Methodology
27
 
28
  This model was created using a weighted linear merge:
29
-
30
  ```
31
- Llama-3-Swallow-Infused-R1776-70B =
32
  r1-1776-distill-llama-70b + 0.4 * (
33
  Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
34
  )
35
  ```
 
 
 
 
 
36
 
37
- * **Base**: `perplexity-ai/r1-1776-distill-llama-70b` (MIT License)
38
- * **Delta**: `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` - `meta-llama/Llama-3.3-70B-Instruct`
39
- * **Merge Tool**: Performed using [MergeKit](https://github.com/arcee-ai/mergekit)
40
- * **Scaling Factor**: `α = 0.4`
41
 
42
- The resulting model maintains the backbone of R1776 while incorporating Swallow's improved instruction tuning.
43
 
44
  ## Languages
45
 
46
- * English
47
- * Japanese
 
 
 
 
 
 
48
 
49
- ## Recommended parameters
50
 
51
- * temperature: 0.6
52
- * top_p: 0.95
53
- * top_k: 40
54
- * min_p: 0.0
55
 
56
  ## License
57
 
@@ -59,17 +66,17 @@ This model is distributed under the **Meta Llama 3 Community License**.
59
  Please review and comply with its terms:
60
  [https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
61
 
62
- **Key Restrictions Include:**
63
 
64
- * Do not use this model to improve competing LLMs.
65
- * Reuse must include the phrase: **"Built with Meta Llama 3."**
66
- * For organizations with over **700M MAU**, a separate license from Meta is required.
67
- * Model name must include “Llama 3”.
68
 
69
  ## Citations
70
 
71
  If you use this model, please cite the original works:
72
 
73
- * Perplexity AI's [r1-1776-distill-llama-70b](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b)
74
- * TokyoTech-LLM's [Llama-3.3-Swallow-70B-Instruct-v0.4](https://huggingface.co/tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4)
75
- * Meta's [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
 
19
 
20
  ## Overview
21
 
22
+ **Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model built on Meta's **Llama 3** architecture. This model combines the distilled reasoning performance of `r1-1776-distill-llama-70b` with enhanced instruction-following capabilities from the `Swallow` model, making it particularly effective for both English and Japanese instruction tasks.
23
 
24
+ The foundation of this model leverages `perplexity-ai/r1-1776-distill-llama-70b`, a distilled model fine-tuned for reasoning tasks on top of Llama 3.3. To boost Japanese language proficiency and overall instruction alignment, we incorporated the ChatVector from `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4`. This approach - **adding an instruction-tuned model’s ChatVector to a reasoning-centric model** - represents an innovative strategy to enhance the model's multilingual reasoning capabilities.
25
 
26
  ## Merge Methodology
27
 
28
  This model was created using a weighted linear merge:
 
29
  ```
30
+ Llama-3-Swallow-Infused-R1776-70B =
31
  r1-1776-distill-llama-70b + 0.4 * (
32
  Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
33
  )
34
  ```
35
+ - **Base**: `perplexity-ai/r1-1776-distill-llama-70b`
36
+ - A distilled reasoning-focused model built on Meta Llama 3.3.
37
+ - **Delta**: Difference between `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` and `meta-llama/Llama-3.3-70B-Instruct`.
38
+ - **Merge Tool**: [MergeKit](https://github.com/arcee-ai/mergekit)
39
+ - **Scaling Factor**: `α = 0.4`
40
 
41
+ Before merging, we performed vocabulary alignment to ensure consistency between the merged components. This step uses [yasu-oh/merge_tools](https://github.com/yasu-oh/merge_tools) to align the vocabulary of the added model with the tokenizer of the base model. This preprocessing step prevents token mismatches and preserves high-quality performance across merged models.
 
 
 
42
 
43
+ This methodology ensures that the reasoning backbone of R1776 is retained while integrating Swallow's enhancements in instruction tuning and Japanese language support.
44
 
45
  ## Languages
46
 
47
+ - English
48
+ - Japanese
49
+
50
+ ## Key Features
51
+
52
+ - Bilingual support: robust performance for both English and Japanese tasks.
53
+ - Enhanced reasoning and instruction-following capabilities.
54
+ - Novel use of ChatVector addition from instruction-tuned models to a reasoning-centric base.
55
 
56
+ ## Recommended Parameters
57
 
58
+ - `temperature`: 0.6
59
+ - `top_p`: 0.95
60
+ - `top_k`: 40
61
+ - `min_p`: 0.0
62
 
63
  ## License
64
 
 
66
  Please review and comply with its terms:
67
  [https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
68
 
69
+ **Key Restrictions Include**:
70
 
71
+ - Do not use this model to improve competing large language models (LLMs).
72
+ - When reusing this model, include the phrase: **"Built with Meta Llama 3."**
73
+ - Organizations with more than **700 million monthly active users (MAU)** require a separate license from Meta.
74
+ - Model names must include “Llama 3”.
75
 
76
  ## Citations
77
 
78
  If you use this model, please cite the original works:
79
 
80
+ - [perplexity-ai/r1-1776-distill-llama-70b](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b)
81
+ - [tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4](https://huggingface.co/tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4)
82
+ - [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)