Update README.md
Browse files
README.md
CHANGED
|
@@ -7,8 +7,9 @@ tags:
|
|
| 7 |
- llm
|
| 8 |
- text_generation
|
| 9 |
---
|
| 10 |
-
# LLaDA2-mini-preview
|
| 11 |
-
|
|
|
|
| 12 |
|
| 13 |
<div align="center">
|
| 14 |
<img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
|
|
@@ -51,7 +52,7 @@ tags:
|
|
| 51 |
+ **Leading MoE Architecture**:
|
| 52 |
The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
|
| 53 |
+ **Efficient Inference**:
|
| 54 |
-
With **16 billion total parameters**, only **1.4 billion** are activated during inference.
|
| 55 |
+ **Impressive Performance on Code & Complex Reasoning**:
|
| 56 |
Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
|
| 57 |
+ **Tool Use**:
|
|
@@ -64,13 +65,13 @@ Fully open-source with commitment to transparency. We plan to release a **leadin
|
|
| 64 |
## π¦ Model Variants
|
| 65 |
| Model ID | Description | Hugging Face Link |
|
| 66 |
| --- | --- | --- |
|
| 67 |
-
| `inclusionAI/LLaDA2-mini-preview` | Instruction-tuned model, ready for downstream applications. | [π€ Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
|
| 68 |
|
| 69 |
|
| 70 |
---
|
| 71 |
|
| 72 |
## π Model Overview
|
| 73 |
-
**LLaDA2-mini-preview** has the following specifications:
|
| 74 |
|
| 75 |
+ **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
|
| 76 |
+ **Total Parameters (Non-Embedding)**: 16B
|
|
@@ -91,7 +92,7 @@ import torch.nn.functional as F
|
|
| 91 |
from transformers import AutoModelForCausalLM
|
| 92 |
from transformers import AutoTokenizer
|
| 93 |
|
| 94 |
-
model_path = "/path/to/LLaDA2-mini-preview"
|
| 95 |
device = "cuda:0"
|
| 96 |
model = AutoModelForCausalLM.from_pretrained(
|
| 97 |
model_path, trust_remote_code=True, device_map=device
|
|
|
|
| 7 |
- llm
|
| 8 |
- text_generation
|
| 9 |
---
|
| 10 |
+
# LLaDA2.0-mini-preview
|
| 11 |
+
|
| 12 |
+
**LLaDA2.0-mini-preview** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
|
| 13 |
|
| 14 |
<div align="center">
|
| 15 |
<img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
|
|
|
|
| 52 |
+ **Leading MoE Architecture**:
|
| 53 |
The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
|
| 54 |
+ **Efficient Inference**:
|
| 55 |
+
With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA2.0-mini-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
|
| 56 |
+ **Impressive Performance on Code & Complex Reasoning**:
|
| 57 |
Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
|
| 58 |
+ **Tool Use**:
|
|
|
|
| 65 |
## π¦ Model Variants
|
| 66 |
| Model ID | Description | Hugging Face Link |
|
| 67 |
| --- | --- | --- |
|
| 68 |
+
| `inclusionAI/LLaDA2.0-mini-preview` | Instruction-tuned model, ready for downstream applications. | [π€ Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
|
| 69 |
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
## π Model Overview
|
| 74 |
+
**LLaDA2.0-mini-preview** has the following specifications:
|
| 75 |
|
| 76 |
+ **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
|
| 77 |
+ **Total Parameters (Non-Embedding)**: 16B
|
|
|
|
| 92 |
from transformers import AutoModelForCausalLM
|
| 93 |
from transformers import AutoTokenizer
|
| 94 |
|
| 95 |
+
model_path = "/path/to/LLaDA2.0-mini-preview"
|
| 96 |
device = "cuda:0"
|
| 97 |
model = AutoModelForCausalLM.from_pretrained(
|
| 98 |
model_path, trust_remote_code=True, device_map=device
|