You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Use [Experiment Classification](#experiment-classification) to assign semantic colors (grey for baselines, green for treatments) and improve visual distinction in multi-run comparisons.
65
+
63
66
**Generated plots (3 default):**
64
67
2.**TTFT vs Throughput** - Time to first token vs request throughput across concurrency levels
65
68
4.**Token Throughput per GPU vs Latency** - GPU efficiency vs latency (when GPU telemetry available)
Multi-run comparison plots can group runs in different ways to create different colored lines/series. You can customize this in `~/.aiperf/plot_config.yaml`.
183
+
184
+
### Default Grouping Behavior
185
+
186
+
**Without experiment classification:**
187
+
-**Default**: Each run gets its own line (groups by `run_name`)
188
+
-**Customize**: Edit `groups:` in plot presets to group by other fields like `[model]`, `[experiment_group]`, or `[concurrency]`
189
+
190
+
**With experiment classification enabled:**
191
+
-**Always**: Groups by `experiment_group` (directory name) with semantic baseline/treatment colors
192
+
- This override ensures treatment variants are preserved as separate lines
193
+
194
+
### Example Customizations
195
+
196
+
**Group by model** (useful when comparing different models):
197
+
```yaml
198
+
multi_run_plots:
199
+
pareto_curve_throughput_per_gpu_vs_latency:
200
+
# ... other settings ...
201
+
groups: [model]
202
+
```
203
+
204
+
**Group by directory** (useful for hierarchical experiment structures):
205
+
```yaml
206
+
multi_run_plots:
207
+
pareto_curve_throughput_per_gpu_vs_latency:
208
+
# ... other settings ...
209
+
groups: [experiment_group]
210
+
```
211
+
212
+
**Group by run name** (default - each run is a separate line):
213
+
```yaml
214
+
multi_run_plots:
215
+
pareto_curve_throughput_per_gpu_vs_latency:
216
+
# ... other settings ...
217
+
groups: [run_name]
218
+
```
219
+
220
+
> [!TIP]
221
+
> Edit `~/.aiperf/plot_config.yaml` to customize grouping. See the file's CONFIGURATION GUIDE section for detailed examples and options.
222
+
223
+
## Experiment Classification
224
+
225
+
Classify runs as "baseline" or "treatment" for semantic color assignment in multi-run comparisons:
226
+
- **Baselines**: Grey shades, listed first in legend
227
+
- **Treatments**: NVIDIA green shades, listed after baselines
228
+
- **Use case**: Clear visual distinction for A/B testing and performance comparisons
229
+
230
+
### Configuration
231
+
232
+
Edit `~/.aiperf/plot_config.yaml` (auto-created on first run):
233
+
234
+
```yaml
235
+
experiment_classification:
236
+
baselines:
237
+
- "*baseline*" # Glob patterns
238
+
- "*_agg_*"
239
+
treatments:
240
+
- "*treatment*"
241
+
- "*_disagg_*"
242
+
default: treatment # Fallback when no match
243
+
```
244
+
245
+
> [!IMPORTANT]
246
+
> When experiment classification is enabled, **all multi-run plots automatically group by experiment_group** (directory name). This preserves individual treatment variants while applying semantic baseline/treatment colors. This behavior overrides any explicit `groups:` settings in the config.
247
+
248
+
**Pattern notes**: Uses glob syntax (`*` = wildcard), case-sensitive, first match wins.
**Result**: 4 lines in plots (1 baseline line + 3 treatment lines, each with semantic colors)
278
+
279
+
**Advanced**: Use `group_extraction_pattern` to aggregate variants into named groups:
280
+
- Pattern `"^(treatment_\d+)"` groups `treatment_1_variantA` + `treatment_1_variantB` → `"treatment_1"`
281
+
- See config file for `group_display_names` and other options
282
+
283
+
> [!TIP]
284
+
> See `src/aiperf/plot/default_plot_config.yaml` for detailed options.
285
+
286
+

287
+
288
+

289
+
177
290
## Theme Options
178
291
179
292
Choose between light and dark themes for your plots:
@@ -270,6 +383,9 @@ plots/
270
383
> **Consistent Configurations**: When comparing runs, keep all parameters identical except the one you're testing (e.g., only vary concurrency). This ensures plots show the impact of that specific parameter.
271
384
> Future features in interactive mode will allow pop-ups to show specific configurations of plotted runs.
272
385
386
+
> [!TIP]
387
+
> **Use Experiment Classification**: For multi-run comparisons, configure [experiment classification](#experiment-classification) to distinguish baselines from treatments with semantic colors. This makes it easier to identify reference points and experimental variations.
388
+
273
389
> [!TIP]
274
390
> **Include Warmup**: Use `--warmup-request-count` to ensure the server reaches steady state before measurement. This reduces noise in your visualizations.
0 commit comments