dacorvo HF Staff commited on
Commit
2fdfa36
·
verified ·
1 Parent(s): 2c838fd

add cached qwen3-moe configurations

Browse files
inference-cache-config/qwen3-moe.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Qwen/Qwen3-30B-A3B-Instruct-2507": [
3
+ {
4
+ "batch_size": 1,
5
+ "sequence_length": 4096,
6
+ "num_cores": 8,
7
+ "auto_cast_type": "bf16"
8
+ },
9
+ {
10
+ "batch_size": 4,
11
+ "sequence_length": 4096,
12
+ "num_cores": 2,
13
+ "auto_cast_type": "bf16"
14
+ },
15
+ {
16
+ "batch_size": 8,
17
+ "sequence_length": 4096,
18
+ "num_cores": 2,
19
+ "auto_cast_type": "bf16"
20
+ }
21
+ ]
22
+ }