Update README.md
Browse files
README.md
CHANGED
|
@@ -349,13 +349,13 @@ Run the benchmarks under `vllm` root folder:
|
|
| 349 |
### baseline
|
| 350 |
```Shell
|
| 351 |
export MODEL=Qwen/Qwen3-8B
|
| 352 |
-
|
| 353 |
```
|
| 354 |
|
| 355 |
### AWQ-INT4
|
| 356 |
```Shell
|
| 357 |
export MODEL=pytorch/Qwen3-8B-AWQ-INT4
|
| 358 |
-
VLLM_DISABLE_COMPILE_CACHE=1
|
| 359 |
```
|
| 360 |
|
| 361 |
## benchmark_serving
|
|
|
|
| 349 |
### baseline
|
| 350 |
```Shell
|
| 351 |
export MODEL=Qwen/Qwen3-8B
|
| 352 |
+
vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
|
| 353 |
```
|
| 354 |
|
| 355 |
### AWQ-INT4
|
| 356 |
```Shell
|
| 357 |
export MODEL=pytorch/Qwen3-8B-AWQ-INT4
|
| 358 |
+
VLLM_DISABLE_COMPILE_CACHE=1 vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
|
| 359 |
```
|
| 360 |
|
| 361 |
## benchmark_serving
|