Name	Name	Last commit message	Last commit date
Latest commit History 29 Commits
configs	configs
README.md	README.md
download_mlperf_datasets.sh	download_mlperf_datasets.sh
sglang_bench.sh	sglang_bench.sh
vllm_bench.sh	vllm_bench.sh

Name

Last commit message

Last commit date

29 Commits

configs

README.md

download_mlperf_datasets.sh

sglang_bench.sh

vllm_bench.sh

vllm-model-bash

Scripts for vllm-model-bash efforts

Usage

bash vllm_bench.sh config.yaml

Example configs/test.yaml

🚀 vLLM Benchmark Harness (`vllm_bench.sh`)

This harness automates:

Launching and monitoring vllm serve for multiple models
Running vllm bench serve benchmarks with per-model overrides
Collecting Nsight Systems (nsys) profiling traces
Generating structured results and per-model summaries

🧠 Overview

Each benchmark run:

Launches a vLLM server based on the YAML config
Runs concurrency sweeps and collects latency/throughput metrics
Optionally profiles GPU activity via Nsight Systems (nsys), and/or PyTorch Profiler
Produces organized outputs under a specified directory

Ideal for performance characterization, MLPerf inference testing, and multi-level GPU profiling at scale.

⚙️ Requirements

Dependencies

Install these packages:

sudo apt-get install jq curl -y
pip install yq

Profiling Tools (Optional)

For GPU profiling capabilities:

Nsight Systems: System-wide performance analysis, CUDA graph tracing
Nsight Compute: Detailed kernel-level analysis
PyTorch Profiler: Python/PyTorch-level CPU and GPU profiling with memory tracking

🔥 Profiling Options

1. Nsight Systems (nsys)

Captures system-wide GPU activity, CUDA graphs, and NVTX ranges.

profiling:
  nsys_launch_args: "--trace=cuda,nvtx,osrt --cuda-graph-trace=node"
  nsys_start_args: "--force-overwrite=true --gpu-metrics-devices=cuda-visible"

Outputs: .qdrep files viewable in Nsight Systems GUI

2. PyTorch Profiler

Captures Python-level CPU/GPU activity, memory allocations, and operator traces.

profiling:
  torch_profiler:
    enabled: true
    record_shapes: true        # Record tensor shapes
    profile_memory: true       # Track memory allocations
    with_stack: false          # Include Python stack traces
    with_flops: false          # Include FLOP estimates

Outputs:

Chrome trace files (.json) - viewable in chrome://tracing
PyTorch .pt trace files - loadable with torch.profiler.load()

3. Combined Profiling

You can enable both nsys and torch profiler simultaneously:

profile: true  # Enables nsys
profiling:
  nsys_launch_args: "--trace=cuda,nvtx,osrt --cuda-graph-trace=node"
  nsys_start_args: "--force-overwrite=true --gpu-metrics-devices=cuda-visible"

  torch_profiler:
    enabled: true
    record_shapes: true
    profile_memory: true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vllm-model-bash

Usage

Example configs/test.yaml

🚀 vLLM Benchmark Harness (`vllm_bench.sh`)

🧠 Overview

⚙️ Requirements

Dependencies

Profiling Tools (Optional)

🔥 Profiling Options

1. Nsight Systems (nsys)

2. PyTorch Profiler

3. Combined Profiling

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

openshift-psap/vllm-model-bash

Folders and files

Latest commit

History

Repository files navigation

vllm-model-bash

Usage

Example configs/test.yaml

🚀 vLLM Benchmark Harness (vllm_bench.sh)

🧠 Overview

⚙️ Requirements

Dependencies

Profiling Tools (Optional)

🔥 Profiling Options

1. Nsight Systems (nsys)

2. PyTorch Profiler

3. Combined Profiling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

🚀 vLLM Benchmark Harness (`vllm_bench.sh`)

Packages