|  | --- | 
					
						
						|  | license: mit | 
					
						
						|  | library_name: transformers | 
					
						
						|  | tags: | 
					
						
						|  | - mergekit | 
					
						
						|  | - merge | 
					
						
						|  | base_model: | 
					
						
						|  | - Qwen/Qwen2.5-7B-Instruct-1M | 
					
						
						|  | - Sakalti/SJT-7B-1M | 
					
						
						|  | - Triangle104/Q2.5-Instruct-1M_Harmony | 
					
						
						|  | - bunnycore/Qwen2.5-7B-RRP-1M | 
					
						
						|  | - huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated | 
					
						
						|  | model-index: | 
					
						
						|  | - name: Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | results: | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: IFEval (0-Shot) | 
					
						
						|  | type: HuggingFaceH4/ifeval | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 0 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: inst_level_strict_acc and prompt_level_strict_acc | 
					
						
						|  | value: 59.44 | 
					
						
						|  | name: strict accuracy | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: BBH (3-Shot) | 
					
						
						|  | type: BBH | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 3 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: acc_norm | 
					
						
						|  | value: 34.51 | 
					
						
						|  | name: normalized accuracy | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: MATH Lvl 5 (4-Shot) | 
					
						
						|  | type: hendrycks/competition_math | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 4 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: exact_match | 
					
						
						|  | value: 33.01 | 
					
						
						|  | name: exact match | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: GPQA (0-shot) | 
					
						
						|  | type: Idavidrein/gpqa | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 0 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: acc_norm | 
					
						
						|  | value: 9.17 | 
					
						
						|  | name: acc_norm | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: MuSR (0-shot) | 
					
						
						|  | type: TAUR-Lab/MuSR | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 0 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: acc_norm | 
					
						
						|  | value: 16.74 | 
					
						
						|  | name: acc_norm | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | - task: | 
					
						
						|  | type: text-generation | 
					
						
						|  | name: Text Generation | 
					
						
						|  | dataset: | 
					
						
						|  | name: MMLU-PRO (5-shot) | 
					
						
						|  | type: TIGER-Lab/MMLU-Pro | 
					
						
						|  | config: main | 
					
						
						|  | split: test | 
					
						
						|  | args: | 
					
						
						|  | num_few_shot: 5 | 
					
						
						|  | metrics: | 
					
						
						|  | - type: acc | 
					
						
						|  | value: 37.63 | 
					
						
						|  | name: accuracy | 
					
						
						|  | source: | 
					
						
						|  | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  | name: Open LLM Leaderboard | 
					
						
						|  | --- | 
					
						
						|  | # ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M | 
					
						
						|  |  | 
					
						
						|  | **ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M** is a custom merged language model based on **Qwen2.5-7B** with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to **1 million token** context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions. | 
					
						
						|  |  | 
					
						
						|  | Quants are availble in GGUF format, provided by [mradermacher](https://huggingface.co/mradermacher). | 
					
						
						|  | 1. [GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-GGUF) | 
					
						
						|  | 2. [imatrix GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-i1-GGUF) | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π§ **Model Details** | 
					
						
						|  | - **Base Model**: `Qwen/Qwen2.5-7B-Instruct-1M` | 
					
						
						|  | - **Models Used in Merge**: | 
					
						
						|  | - `Qwen/Qwen2.5-7B-Instruct-1M` | 
					
						
						|  | - `bunnycore/Qwen2.5-7B-RRP-1M` | 
					
						
						|  | - `Triangle104/Q2.5-Instruct-1M_Harmony` | 
					
						
						|  | - `Sakalti/SJT-7B-1M` | 
					
						
						|  | - `huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated` | 
					
						
						|  | - **Merge Method**: `MODEL_STOCK` (Optimized layer-wise weight averaging) | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π **Overview** | 
					
						
						|  | **Qwen2.5-7B-CelestialHarmony-1M** enhances the **Qwen2.5-7B series** with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for: | 
					
						
						|  | - **Roleplaying** π§ββοΈ: Immersive character-based storytelling with deep contextual awareness. | 
					
						
						|  | - **Reasoning & Thought Processing** π§ : Capable of structured logical thinking, especially when prompted with `<think>` tags. | 
					
						
						|  | - **Ultra-Long Context Handling** π: Efficient processing of sequences up to **1,010,000 tokens** using optimized sparse attention. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## βοΈ **Technical Specifications** | 
					
						
						|  | | Specification  | Value | | 
					
						
						|  | |--------------|---------| | 
					
						
						|  | | **Model Type** | Causal Language Model | | 
					
						
						|  | | **Parameters** | 7.61B | | 
					
						
						|  | | **Non-Embedding Parameters** | 6.53B | | 
					
						
						|  | | **Layers** | 28 | | 
					
						
						|  | | **Attention Heads (GQA)** | 28 (Q), 4 (KV) | | 
					
						
						|  | | **Max Context Length** | 1,010,000 tokens | | 
					
						
						|  | | **Max Generation Length** | 8,192 tokens | | 
					
						
						|  | | **Merge Method** | Model Stock| | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π¬ **Merging Details** | 
					
						
						|  | This model was merged using the **Model Stock** method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model. | 
					
						
						|  |  | 
					
						
						|  | ### **Merge YAML Configuration** | 
					
						
						|  | ```yaml | 
					
						
						|  | base_model: Qwen/Qwen2.5-7B-Instruct-1M | 
					
						
						|  | dtype: bfloat16 | 
					
						
						|  | merge_method: model_stock | 
					
						
						|  | models: | 
					
						
						|  | - model: Qwen/Qwen2.5-7B-Instruct-1M | 
					
						
						|  | - model: Triangle104/Q2.5-Instruct-1M_Harmony | 
					
						
						|  | - model: Sakalti/SJT-7B-1M | 
					
						
						|  | - model: bunnycore/Qwen2.5-7B-RRP-1M | 
					
						
						|  | - model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated | 
					
						
						|  | tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π **Quickstart** | 
					
						
						|  | ### **Install Required Packages** | 
					
						
						|  | Ensure you have the latest `transformers` library installed: | 
					
						
						|  | ```bash | 
					
						
						|  | pip install transformers torch accelerate | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### **Load and Use the Model** | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
						
						|  |  | 
					
						
						|  | model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M" | 
					
						
						|  |  | 
					
						
						|  | model = AutoModelForCausalLM.from_pretrained( | 
					
						
						|  | model_name, | 
					
						
						|  | torch_dtype="auto", | 
					
						
						|  | device_map="auto" | 
					
						
						|  | ) | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained(model_name) | 
					
						
						|  |  | 
					
						
						|  | prompt = "Tell me a short story about an ancient celestial warrior." | 
					
						
						|  | messages = [ | 
					
						
						|  | {"role": "system", "content": "You are a wise celestial storyteller."}, | 
					
						
						|  | {"role": "user", "content": prompt} | 
					
						
						|  | ] | 
					
						
						|  | text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | 
					
						
						|  | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | 
					
						
						|  |  | 
					
						
						|  | generated_ids = model.generate(**model_inputs, max_new_tokens=512) | 
					
						
						|  | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | 
					
						
						|  |  | 
					
						
						|  | print(response) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## β‘ **Optimized Deployment with vLLM** | 
					
						
						|  | For long-context inference, use **vLLM**: | 
					
						
						|  | ```bash | 
					
						
						|  | git clone -b dev/dual-chunk-attn [email protected]:QwenLM/vllm.git | 
					
						
						|  | cd vllm | 
					
						
						|  | pip install -e . -v | 
					
						
						|  | ``` | 
					
						
						|  | Run the model: | 
					
						
						|  | ```bash | 
					
						
						|  | vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \ | 
					
						
						|  | --tensor-parallel-size 4 \ | 
					
						
						|  | --max-model-len 1010000 \ | 
					
						
						|  | --enable-chunked-prefill --max-num-batched-tokens 131072 \ | 
					
						
						|  | --enforce-eager \ | 
					
						
						|  | --max-num-seqs 1 | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π― **Model Capabilities** | 
					
						
						|  | β
 **Roleplay & Storytelling** β Designed for engaging interactions. | 
					
						
						|  | β
 **Long-Context Awareness** β Handles texts up to **1M tokens**. | 
					
						
						|  | β
 **Logical Thinking & Reasoning** β Supports `<think>` tag to enhance thought structuring. | 
					
						
						|  | β
 **Optimized Merge Strategy** β Uses `Model Stock` for superior generalization. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π **Acknowledgments** | 
					
						
						|  | This model is built on top of **Qwen2.5-7B**, with contributions from **bunnycore, Triangle104, and Sakalti**, leveraging the **Model Stock** merging methodology. | 
					
						
						|  |  | 
					
						
						|  | For further details, see: | 
					
						
						|  | - π [Qwen2.5-7B Technical Report](https://arxiv.org/abs/2501.15383) | 
					
						
						|  | - π [MergeKit Documentation](https://github.com/mlfoundations/mergekit) | 
					
						
						|  | - π [vLLM for Long-Context Inference](https://github.com/QwenLM/vllm) | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  | # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) | 
					
						
						|  | Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/ZeroXClem__Qwen2.5-7B-CelestialHarmony-1M-details) | 
					
						
						|  |  | 
					
						
						|  | |      Metric       |Value| | 
					
						
						|  | |-------------------|----:| | 
					
						
						|  | |Avg.               |31.75| | 
					
						
						|  | |IFEval (0-Shot)    |59.44| | 
					
						
						|  | |BBH (3-Shot)       |34.51| | 
					
						
						|  | |MATH Lvl 5 (4-Shot)|33.01| | 
					
						
						|  | |GPQA (0-shot)      | 9.17| | 
					
						
						|  | |MuSR (0-shot)      |16.74| | 
					
						
						|  | |MMLU-PRO (5-shot)  |37.63| | 
					
						
						|  |  | 
					
						
						|  |  |