inclusionAI
/

LLaDA2.0-mini-preview

@@ -7,8 +7,9 @@ tags:
 - llm
 - text_generation
 ---
-# LLaDA2-mini-preview
-**LLaDA2-mini-preview** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
 <div align="center">
   <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
@@ -51,7 +52,7 @@ tags:
 + **Leading MoE Architecture**:
 The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
 + **Efficient Inference**:
-With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA-mini-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
 + **Impressive Performance on Code & Complex Reasoning**:
 Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
 + **Tool Use**:
@@ -64,13 +65,13 @@ Fully open-source with commitment to transparency. We plan to release a **leadin
 ## 📦 Model Variants
 | Model ID | Description | Hugging Face Link |
 | --- | --- | --- |
-| `inclusionAI/LLaDA2-mini-preview` | Instruction-tuned model, ready for downstream applications. | [🤗 Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
 ---
 ## 🔍 Model Overview
-**LLaDA2-mini-preview** has the following specifications:
 + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
 + **Total Parameters (Non-Embedding)**: 16B
@@ -91,7 +92,7 @@ import torch.nn.functional as F
 from transformers import AutoModelForCausalLM
 from transformers import AutoTokenizer
-model_path = "/path/to/LLaDA2-mini-preview"
 device = "cuda:0"
 model = AutoModelForCausalLM.from_pretrained(
     model_path, trust_remote_code=True, device_map=device

 - llm
 - text_generation
 ---
+# LLaDA2.0-mini-preview
+**LLaDA2.0-mini-preview** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
 <div align="center">
   <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
 + **Leading MoE Architecture**:
 The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
 + **Efficient Inference**:
+With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA2.0-mini-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
 + **Impressive Performance on Code & Complex Reasoning**:
 Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
 + **Tool Use**:
 ## 📦 Model Variants
 | Model ID | Description | Hugging Face Link |
 | --- | --- | --- |
+| `inclusionAI/LLaDA2.0-mini-preview` | Instruction-tuned model, ready for downstream applications. | [🤗 Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
 ---
 ## 🔍 Model Overview
+**LLaDA2.0-mini-preview** has the following specifications:
 + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
 + **Total Parameters (Non-Embedding)**: 16B
 from transformers import AutoModelForCausalLM
 from transformers import AutoTokenizer
+model_path = "/path/to/LLaDA2.0-mini-preview"
 device = "cuda:0"
 model = AutoModelForCausalLM.from_pretrained(
     model_path, trust_remote_code=True, device_map=device