jerryzh168 commited on
Commit
88d970d
·
verified ·
1 Parent(s): de9462b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -349,13 +349,13 @@ Run the benchmarks under `vllm` root folder:
349
  ### baseline
350
  ```Shell
351
  export MODEL=Qwen/Qwen3-8B
352
- python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model $MODEL --batch-size 1
353
  ```
354
 
355
  ### AWQ-INT4
356
  ```Shell
357
  export MODEL=pytorch/Qwen3-8B-AWQ-INT4
358
- VLLM_DISABLE_COMPILE_CACHE=1 python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model $MODEL --batch-size 1
359
  ```
360
 
361
  ## benchmark_serving
 
349
  ### baseline
350
  ```Shell
351
  export MODEL=Qwen/Qwen3-8B
352
+ vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
353
  ```
354
 
355
  ### AWQ-INT4
356
  ```Shell
357
  export MODEL=pytorch/Qwen3-8B-AWQ-INT4
358
+ VLLM_DISABLE_COMPILE_CACHE=1 vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
359
  ```
360
 
361
  ## benchmark_serving