yasu-oh
/

Llama-3-Swallow-Infused-R1776-70B

@@ -19,39 +19,46 @@ base_model_relation: merge
 ## Overview
-**Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model based on Meta’s **Llama 3** architecture. It combines the distilled instruction-following behavior of `r1-1776` with enhancements derived from the `Swallow` delta over Meta's base Llama 3.3 model.
-This composition is particularly suited for English and Japanese instruction tasks, maintaining robustness while introducing sharper alignment capabilities.
 ## Merge Methodology
 This model was created using a weighted linear merge:
 ```
-Llama-3-Swallow-Infused-R1776-70B =
   r1-1776-distill-llama-70b + 0.4 * (
     Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
   )
 ```
-* **Base**: `perplexity-ai/r1-1776-distill-llama-70b` (MIT License)
-* **Delta**: `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` - `meta-llama/Llama-3.3-70B-Instruct`
-* **Merge Tool**: Performed using [MergeKit](https://github.com/arcee-ai/mergekit)
-* **Scaling Factor**: `α = 0.4`
-The resulting model maintains the backbone of R1776 while incorporating Swallow's improved instruction tuning.
 ## Languages
-* English
-* Japanese
-## Recommended parameters
-* temperature: 0.6
-* top_p: 0.95
-* top_k: 40
-* min_p: 0.0
 ## License
@@ -59,17 +66,17 @@ This model is distributed under the **Meta Llama 3 Community License**.
 Please review and comply with its terms:
 [https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
-**Key Restrictions Include:**
-* Do not use this model to improve competing LLMs.
-* Reuse must include the phrase: **"Built with Meta Llama 3."**
-* For organizations with over **700M MAU**, a separate license from Meta is required.
-* Model name must include “Llama 3”.
 ## Citations
 If you use this model, please cite the original works:
-* Perplexity AI's [r1-1776-distill-llama-70b](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b)
-* TokyoTech-LLM's [Llama-3.3-Swallow-70B-Instruct-v0.4](https://huggingface.co/tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4)
-* Meta's [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)

 ## Overview
+**Llama-3-Swallow-Infused-R1776-70B** is a 70B parameter merged model built on Meta's **Llama 3** architecture. This model combines the distilled reasoning performance of `r1-1776-distill-llama-70b` with enhanced instruction-following capabilities from the `Swallow` model, making it particularly effective for both English and Japanese instruction tasks.
+The foundation of this model leverages `perplexity-ai/r1-1776-distill-llama-70b`, a distilled model fine-tuned for reasoning tasks on top of Llama 3.3. To boost Japanese language proficiency and overall instruction alignment, we incorporated the ChatVector from `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4`. This approach - **adding an instruction-tuned model’s ChatVector to a reasoning-centric model** - represents an innovative strategy to enhance the model's multilingual reasoning capabilities.
 ## Merge Methodology
 This model was created using a weighted linear merge:
 ```
+Llama-3-Swallow-Infused-R1776-70B =
   r1-1776-distill-llama-70b + 0.4 * (
     Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
   )
 ```
+- **Base**: `perplexity-ai/r1-1776-distill-llama-70b`
+  - A distilled reasoning-focused model built on Meta Llama 3.3.
+- **Delta**: Difference between `tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4` and `meta-llama/Llama-3.3-70B-Instruct`.
+- **Merge Tool**: [MergeKit](https://github.com/arcee-ai/mergekit)
+- **Scaling Factor**: `α = 0.4`
+Before merging, we performed vocabulary alignment to ensure consistency between the merged components. This step uses [yasu-oh/merge_tools](https://github.com/yasu-oh/merge_tools) to align the vocabulary of the added model with the tokenizer of the base model. This preprocessing step prevents token mismatches and preserves high-quality performance across merged models.
+This methodology ensures that the reasoning backbone of R1776 is retained while integrating Swallow's enhancements in instruction tuning and Japanese language support.
 ## Languages
+- English
+- Japanese
+## Key Features
+- Bilingual support: robust performance for both English and Japanese tasks.
+- Enhanced reasoning and instruction-following capabilities.
+- Novel use of ChatVector addition from instruction-tuned models to a reasoning-centric base.
+## Recommended Parameters
+- `temperature`: 0.6
+- `top_p`: 0.95
+- `top_k`: 40
+- `min_p`: 0.0
 ## License
 Please review and comply with its terms:
 [https://www.llama.com/llama3/license/](https://www.llama.com/llama3/license/)
+**Key Restrictions Include**:
+- Do not use this model to improve competing large language models (LLMs).
+- When reusing this model, include the phrase: **"Built with Meta Llama 3."**
+- Organizations with more than **700 million monthly active users (MAU)** require a separate license from Meta.
+- Model names must include “Llama 3”.
 ## Citations
 If you use this model, please cite the original works:
+- [perplexity-ai/r1-1776-distill-llama-70b](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b)
+- [tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4](https://huggingface.co/tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4)
+- [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)