|
| 1 | +# vLLM Metrics Visualizer |
| 2 | + |
| 3 | +A comprehensive visualization tool for vLLM metrics data collected by the metrics collector. Supports multiple storage formats and provides plotting capabilities for analysis and comparison. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Multiple Storage Formats**: Support for JSON, CSV, SQLite, and Prometheus formats |
| 8 | +- **Single Metric Plotting**: Plot individual metrics over time |
| 9 | +- **Multiple Metrics Plotting**: Plot multiple metrics in subplots |
| 10 | +- **Run Comparison**: Compare metrics between different runs |
| 11 | +- **Summary Reports**: Generate statistical summaries of metrics |
| 12 | +- **Command Line Interface**: Easy-to-use CLI for quick visualization |
| 13 | +- **Programmatic API**: Use in your own Python applications |
| 14 | +- **Custom Styling**: Customizable plots with different themes and sizes |
| 15 | + |
| 16 | +## Installation |
| 17 | + |
| 18 | +1. Install required dependencies: |
| 19 | +```bash |
| 20 | +pip install -r requirements.txt |
| 21 | +``` |
| 22 | + |
| 23 | +2. Ensure you have metrics data collected using the vLLM Metrics Collector. |
| 24 | + |
| 25 | +## Usage |
| 26 | + |
| 27 | +### Command Line Interface |
| 28 | + |
| 29 | +#### Basic Usage |
| 30 | +```bash |
| 31 | +# List available metrics |
| 32 | +python vllm_metrics_visualizer.py --file metrics.json --list-metrics |
| 33 | + |
| 34 | +# Plot a single metric |
| 35 | +python vllm_metrics_visualizer.py --file metrics.json --metric "vllm:num_requests_running" |
| 36 | + |
| 37 | +# Plot multiple metrics |
| 38 | +python vllm_metrics_visualizer.py --file metrics.json --metrics "vllm:num_requests_running" "vllm:gpu_utilization" |
| 39 | +``` |
| 40 | + |
| 41 | +#### Comparison Usage |
| 42 | +```bash |
| 43 | +# Compare single metric between two runs |
| 44 | +python vllm_metrics_visualizer.py \ |
| 45 | + --file run1_metrics.json \ |
| 46 | + --compare-file run2_metrics.json \ |
| 47 | + --metric "vllm:request_latency" \ |
| 48 | + --label1 "Baseline" \ |
| 49 | + --label2 "Optimized" |
| 50 | + |
| 51 | +# Compare multiple metrics |
| 52 | +python vllm_metrics_visualizer.py \ |
| 53 | + --file run1_metrics.json \ |
| 54 | + --compare-file run2_metrics.json \ |
| 55 | + --metrics "vllm:num_requests_running" "vllm:gpu_utilization" \ |
| 56 | + --label1 "Baseline" \ |
| 57 | + --label2 "Optimized" |
| 58 | +``` |
| 59 | + |
| 60 | +#### Advanced Usage |
| 61 | +```bash |
| 62 | +# Generate summary report |
| 63 | +python vllm_metrics_visualizer.py --file metrics.json --summary report.json |
| 64 | + |
| 65 | +# Save plots to file |
| 66 | +python vllm_metrics_visualizer.py \ |
| 67 | + --file metrics.json \ |
| 68 | + --metric "vllm:gpu_utilization" \ |
| 69 | + --save "gpu_utilization.png" |
| 70 | + |
| 71 | +# Use different storage format |
| 72 | +python vllm_metrics_visualizer.py \ |
| 73 | + --file metrics.db \ |
| 74 | + --format sqlite \ |
| 75 | + --metric "vllm:num_requests_running" |
| 76 | +``` |
| 77 | + |
| 78 | +### Programmatic Usage |
| 79 | + |
| 80 | +#### Basic Plotting |
| 81 | +```python |
| 82 | +from vllm_metrics_visualizer import VLLMMetricsVisualizer |
| 83 | + |
| 84 | +# Create visualizer |
| 85 | +visualizer = VLLMMetricsVisualizer() |
| 86 | + |
| 87 | +# Plot single metric |
| 88 | +visualizer.plot_metric( |
| 89 | + file_path="metrics.json", |
| 90 | + metric_name="vllm:num_requests_running", |
| 91 | + title="Running Requests Over Time" |
| 92 | +) |
| 93 | + |
| 94 | +# Plot multiple metrics |
| 95 | +visualizer.plot_multiple_metrics( |
| 96 | + file_path="metrics.csv", |
| 97 | + metric_names=["vllm:num_requests_running", "vllm:gpu_utilization"], |
| 98 | + title="Performance Metrics" |
| 99 | +) |
| 100 | +``` |
| 101 | + |
| 102 | +#### Comparison Plotting |
| 103 | +```python |
| 104 | +# Compare single metric |
| 105 | +visualizer.compare_metrics( |
| 106 | + file_path1="run1_metrics.json", |
| 107 | + file_path2="run2_metrics.json", |
| 108 | + metric_name="vllm:request_latency", |
| 109 | + label1="Baseline Run", |
| 110 | + label2="Optimized Run" |
| 111 | +) |
| 112 | + |
| 113 | +# Compare multiple metrics |
| 114 | +visualizer.compare_multiple_metrics( |
| 115 | + file_path1="baseline_metrics.csv", |
| 116 | + file_path2="optimized_metrics.csv", |
| 117 | + metric_names=["vllm:num_requests_running", "vllm:gpu_utilization"], |
| 118 | + label1="Baseline", |
| 119 | + label2="Optimized" |
| 120 | +) |
| 121 | +``` |
| 122 | + |
| 123 | +#### Data Analysis |
| 124 | +```python |
| 125 | +# Load metrics data |
| 126 | +df = visualizer.load_metrics("metrics.json") |
| 127 | + |
| 128 | +# Get available metrics |
| 129 | +metrics = visualizer.get_available_metrics("metrics.json") |
| 130 | + |
| 131 | +# Generate summary report |
| 132 | +summary = visualizer.generate_summary_report("metrics.json", "summary.json") |
| 133 | +``` |
| 134 | + |
| 135 | +### Custom Styling |
| 136 | + |
| 137 | +```python |
| 138 | +# Create visualizer with custom styling |
| 139 | +visualizer = VLLMMetricsVisualizer( |
| 140 | + style='seaborn-v0_8-darkgrid', |
| 141 | + figsize=(15, 10) |
| 142 | +) |
| 143 | + |
| 144 | +# Use custom styling for plots |
| 145 | +visualizer.plot_metric( |
| 146 | + file_path="metrics.json", |
| 147 | + metric_name="vllm:gpu_utilization", |
| 148 | + title="GPU Utilization with Custom Styling" |
| 149 | +) |
| 150 | +``` |
| 151 | + |
| 152 | +## Supported Storage Formats |
| 153 | + |
| 154 | +### JSON Format |
| 155 | +```python |
| 156 | +# JSON metrics file |
| 157 | +visualizer.plot_metric("metrics.json", "vllm:num_requests_running") |
| 158 | +``` |
| 159 | + |
| 160 | +### CSV Format |
| 161 | +```python |
| 162 | +# CSV metrics file |
| 163 | +visualizer.plot_metric("metrics.csv", "vllm:gpu_utilization", format_type="csv") |
| 164 | +``` |
| 165 | + |
| 166 | +### SQLite Format |
| 167 | +```python |
| 168 | +# SQLite database |
| 169 | +visualizer.plot_metric("metrics.db", "vllm:request_latency", format_type="sqlite") |
| 170 | +``` |
| 171 | + |
| 172 | +### Prometheus Format |
| 173 | +```python |
| 174 | +# Prometheus format file |
| 175 | +visualizer.plot_metric("metrics.prom", "vllm_request_duration_seconds_bucket", format_type="prometheus") |
| 176 | +``` |
| 177 | + |
| 178 | +## Command Line Options |
| 179 | + |
| 180 | +- `--file`: Path to metrics file (required) |
| 181 | +- `--format`: Storage format (json/csv/sqlite/prometheus, auto-detected if not specified) |
| 182 | +- `--metric`: Single metric to plot |
| 183 | +- `--metrics`: Multiple metrics to plot (space-separated) |
| 184 | +- `--compare-file`: Second file for comparison |
| 185 | +- `--compare-format`: Format of comparison file |
| 186 | +- `--label1`: Label for first run (default: "Run 1") |
| 187 | +- `--label2`: Label for second run (default: "Run 2") |
| 188 | +- `--title`: Plot title |
| 189 | +- `--save`: Path to save plot |
| 190 | +- `--summary`: Generate summary report (JSON format) |
| 191 | +- `--list-metrics`: List available metrics in file |
| 192 | + |
| 193 | +## Examples |
| 194 | + |
| 195 | +See `visualizer_examples.py` for comprehensive usage examples including: |
| 196 | +- Single metric plotting |
| 197 | +- Multiple metrics plotting |
| 198 | +- Run comparison |
| 199 | +- Different storage formats |
| 200 | +- Summary report generation |
| 201 | +- Custom styling |
| 202 | +- Programmatic usage |
| 203 | + |
| 204 | +## Output Formats |
| 205 | + |
| 206 | +### Plots |
| 207 | +- High-resolution PNG images (300 DPI) |
| 208 | +- Customizable titles and labels |
| 209 | +- Support for different matplotlib styles |
| 210 | +- Automatic legend generation for labeled metrics |
| 211 | + |
| 212 | +### Summary Reports |
| 213 | +JSON format with statistical information: |
| 214 | +```json |
| 215 | +{ |
| 216 | + "file_path": "metrics.json", |
| 217 | + "total_records": 1000, |
| 218 | + "time_range": { |
| 219 | + "start": "2024-01-01T00:00:00", |
| 220 | + "end": "2024-01-01T01:00:00" |
| 221 | + }, |
| 222 | + "metrics": { |
| 223 | + "vllm:num_requests_running": { |
| 224 | + "count": 100, |
| 225 | + "mean": 5.2, |
| 226 | + "std": 1.8, |
| 227 | + "min": 0.0, |
| 228 | + "max": 10.0, |
| 229 | + "median": 5.0 |
| 230 | + } |
| 231 | + } |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +## Integration with Metrics Collector |
| 236 | + |
| 237 | +The visualizer works seamlessly with the vLLM Metrics Collector: |
| 238 | + |
| 239 | +1. **Collect metrics** using the metrics collector: |
| 240 | +```bash |
| 241 | +python vllm_metrics_collector.py --storage-type json --output metrics |
| 242 | +``` |
| 243 | + |
| 244 | +2. **Visualize metrics** using the visualizer: |
| 245 | +```bash |
| 246 | +python vllm_metrics_visualizer.py --file metrics.json --metric "vllm:num_requests_running" |
| 247 | +``` |
| 248 | + |
| 249 | +3. **Compare runs** by collecting metrics from different configurations: |
| 250 | +```bash |
| 251 | +# Run 1 |
| 252 | +python vllm_metrics_collector.py --storage-type csv --output run1_metrics |
| 253 | + |
| 254 | +# Run 2 (different configuration) |
| 255 | +python vllm_metrics_collector.py --storage-type csv --output run2_metrics |
| 256 | + |
| 257 | +# Compare |
| 258 | +python vllm_metrics_visualizer.py \ |
| 259 | + --file run1_metrics.csv \ |
| 260 | + --compare-file run2_metrics.csv \ |
| 261 | + --metrics "vllm:gpu_utilization" "vllm:request_latency" |
| 262 | +``` |
| 263 | + |
| 264 | +## Requirements |
| 265 | + |
| 266 | +- Python 3.7+ |
| 267 | +- pandas>=1.5.0 |
| 268 | +- matplotlib>=3.5.0 |
| 269 | +- seaborn>=0.11.0 |
| 270 | +- numpy>=1.21.0 |
| 271 | + |
| 272 | +## License |
| 273 | + |
| 274 | +This project is open source and available under the MIT License. |
0 commit comments