Update README.md
Browse filesvllm model card update
README.md
CHANGED
|
@@ -120,12 +120,12 @@ print("generate_text:", generate_text)
|
|
| 120 |
|
| 121 |
```bash
|
| 122 |
# 80G * 16 GPU
|
| 123 |
-
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --
|
| 124 |
```
|
| 125 |
|
| 126 |
```bash
|
| 127 |
-
# FP8 online quantification 80G *
|
| 128 |
-
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --
|
| 129 |
```
|
| 130 |
|
| 131 |
## Best Practices
|
|
|
|
| 120 |
|
| 121 |
```bash
|
| 122 |
# 80G * 16 GPU
|
| 123 |
+
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --tensor-parallel-size 16
|
| 124 |
```
|
| 125 |
|
| 126 |
```bash
|
| 127 |
+
# FP8 online quantification 80G * 8 GPU
|
| 128 |
+
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --tensor-parallel-size 8 --quantization fp8
|
| 129 |
```
|
| 130 |
|
| 131 |
## Best Practices
|