lccurious commited on
Commit
0da4838
Β·
verified Β·
1 Parent(s): d25d3b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -7,8 +7,9 @@ tags:
7
  - llm
8
  - text_generation
9
  ---
10
- # LLaDA2-mini-preview
11
- **LLaDA2-mini-preview** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
 
12
 
13
  <div align="center">
14
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
@@ -51,7 +52,7 @@ tags:
51
  + **Leading MoE Architecture**:
52
  The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
53
  + **Efficient Inference**:
54
- With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA-mini-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
55
  + **Impressive Performance on Code & Complex Reasoning**:
56
  Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
57
  + **Tool Use**:
@@ -64,13 +65,13 @@ Fully open-source with commitment to transparency. We plan to release a **leadin
64
  ## πŸ“¦ Model Variants
65
  | Model ID | Description | Hugging Face Link |
66
  | --- | --- | --- |
67
- | `inclusionAI/LLaDA2-mini-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
68
 
69
 
70
  ---
71
 
72
  ## πŸ” Model Overview
73
- **LLaDA2-mini-preview** has the following specifications:
74
 
75
  + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
76
  + **Total Parameters (Non-Embedding)**: 16B
@@ -91,7 +92,7 @@ import torch.nn.functional as F
91
  from transformers import AutoModelForCausalLM
92
  from transformers import AutoTokenizer
93
 
94
- model_path = "/path/to/LLaDA2-mini-preview"
95
  device = "cuda:0"
96
  model = AutoModelForCausalLM.from_pretrained(
97
  model_path, trust_remote_code=True, device_map=device
 
7
  - llm
8
  - text_generation
9
  ---
10
+ # LLaDA2.0-mini-preview
11
+
12
+ **LLaDA2.0-mini-preview** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
13
 
14
  <div align="center">
15
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*DeZ9RKxU-LoAAAAAgQAAAAgAemJ7AQ/original" width="800" />
 
52
  + **Leading MoE Architecture**:
53
  The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
54
  + **Efficient Inference**:
55
+ With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA2.0-mini-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
56
  + **Impressive Performance on Code & Complex Reasoning**:
57
  Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
58
  + **Tool Use**:
 
65
  ## πŸ“¦ Model Variants
66
  | Model ID | Description | Hugging Face Link |
67
  | --- | --- | --- |
68
+ | `inclusionAI/LLaDA2.0-mini-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
69
 
70
 
71
  ---
72
 
73
  ## πŸ” Model Overview
74
+ **LLaDA2.0-mini-preview** has the following specifications:
75
 
76
  + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
77
  + **Total Parameters (Non-Embedding)**: 16B
 
92
  from transformers import AutoModelForCausalLM
93
  from transformers import AutoTokenizer
94
 
95
+ model_path = "/path/to/LLaDA2.0-mini-preview"
96
  device = "cuda:0"
97
  model = AutoModelForCausalLM.from_pretrained(
98
  model_path, trust_remote_code=True, device_map=device