exlaw commited on
Commit
5b429a8
·
verified ·
1 Parent(s): d9f58e0

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: Qwen/Qwen3-8B
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - language model
10
+ - parallel-decoding
11
+ ---
12
+
13
+ # WeDLM-8B
14
+
15
+ **WeDLM-8B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
16
+
17
+ This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct).
18
+
19
+ 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
20
+
21
+ ## Model Details
22
+
23
+ | Attribute | Value |
24
+ |:----------|:------|
25
+ | Initialized From | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
26
+ | Parameters | 8B |
27
+ | Context Length | 32,768 |
28
+
29
+ ## Quick Start (Recommended)
30
+
31
+ For **fast inference**, use the `wedlm` engine:
32
+
33
+ ```bash
34
+ pip install git+https://github.com/tencent/WeDLM.git
35
+ ```
36
+
37
+ ```python
38
+ from wedlm import LLM, SamplingParams
39
+
40
+ llm = LLM(model="tencent/WeDLM-8B")
41
+
42
+ prompt = "The theory of relativity states that"
43
+ outputs = llm.generate([prompt], SamplingParams(max_tokens=256))
44
+
45
+ print(outputs[0]["text"])
46
+ ```
47
+
48
+ ## HuggingFace Transformers
49
+
50
+ For **training** or simple forward passes, you can load via Transformers:
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True)
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ "tencent/WeDLM-8B",
58
+ trust_remote_code=True,
59
+ torch_dtype="auto",
60
+ device_map="auto"
61
+ )
62
+
63
+ inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
64
+ outputs = model(**inputs)
65
+ ```
66
+
67
+ > ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.
68
+
69
+ ## Performance
70
+
71
+ | Benchmark | Qwen3-8B | WeDLM-8B |
72
+ |:----------|:--------:|:--------:|
73
+ | ARC-C (0-shot) | 92.66 | **92.92** |
74
+ | GSM8K (3-shot) | 85.97 | **90.20** |
75
+ | MATH (4-shot) | 50.80 | **53.60** |
76
+ | HumanEval (4-shot) | 68.90 | **75.00** |
77
+ | MMLU (5-shot) | 74.03 | **75.46** |
78
+ | **Average** | 72.61 | **74.72** |
79
+
80
+ ## Citation (Coming soon)
81
+
82
+
83
+ ## License
84
+
85
+ Apache 2.0