feat: AIPerf plot configuration system #497

ilana-n · 2025-12-01T22:09:13Z

Plot Configuration & Visualization Enhancements

Summary

This PR builds on top of #459 to introduce configurable YAML-based plot generation and experiment classification with default coloring.

Key Additions

Plot Generation System

Optimal direction indicators: Automatic axis labels with ↑↓ indicators and optimal quadrant shading

Configurable YAML System

default_plot_config.yaml: Centralized configuration for all plot types
Experiment classification: Default color assignment for baseline/treatment comparison
- Groups by experiment_group (directory name) to preserve treatment variants
- Baselines: Grey shades, listed first in legend
- Treatments: NVIDIA green for first, distinct colors for remaining
- Automatic override of groups: settings when classification enabled
Smart default grouping:
- Without classification: Defaults to run_name (each run is a separate line)
- With classification: Forced to experiment_group for semantic coloring
User customization: Auto-creates ~/.aiperf/plot_config.yaml for user to modify
Verbose mode: --verbose flag for detailed logging with error tracebacks
Improved logging: Logs saved to output directory alongside plots

Example Plots:

Summary by CodeRabbit

Release Notes

New Features
- Added experiment classification to organize runs as baselines or treatments with semantic coloring
- Introduced configuration-driven plotting system with custom plot grouping and multi-run/single-run presets
- Added metric direction indicators (↑/↓) and optimal quadrant shading for scatter/Pareto plots
- Enhanced group display customization and experiment group extraction for clearer visualizations
- Added verbose mode for detailed error reporting
Documentation
- New tutorial sections on customizing plot grouping and experiment classification for multi-run analysis

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-01T22:09:23Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yaml

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yaml

ilana-n · 2025-12-01T22:10:20Z

@coderabbitai review

coderabbitai · 2025-12-01T22:10:25Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copilot

Pull request overview

This PR introduces a comprehensive experiment classification and plot configuration system for AIPerf, enabling users to customize visualizations and distinguish between baseline and treatment runs with semantic coloring.

Key changes:

Added YAML-based plot configuration system with auto-creation of user config on first run
Implemented experiment classification to categorize runs as baseline/treatment with semantic colors
Enhanced multi-run plots with optimal direction indicators and quadrant shading

Reviewed changes

Copilot reviewed 24 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/plot/test_png_exporter.py	Updated tests to use config-driven plot specs instead of hardcoded constants
tests/unit/plot/test_plot_generator.py	Added tests for experiment group coloring and display name handling
tests/unit/plot/test_plot_controller.py	Updated warning message assertion for failed run loading
tests/unit/plot/test_plot_config.py	New comprehensive test suite for YAML config loading and validation
tests/unit/plot/test_experiment_classification.py	New test suite for baseline/treatment classification logic
tests/unit/plot/test_cli_runner.py	Updated test to include new verbose and config_path parameters
src/aiperf/plot/plot_controller.py	Added config and classification support to controller
src/aiperf/plot/metric_names.py	Added helper functions to retrieve metrics by data source
src/aiperf/plot/logging.py	Enhanced logging with console-only fallback and verbose mode support
src/aiperf/plot/handlers/multi_run_handlers.py	Added experiment type and display name extraction for plots
src/aiperf/plot/exporters/png/single_run.py	Updated to accept plot specs as parameter
src/aiperf/plot/exporters/png/multi_run.py	Added classification config support and experiment group handling
src/aiperf/plot/default_plot_config.yaml	New default YAML configuration file with plot presets
src/aiperf/plot/core/plot_specs.py	Added ExperimentClassificationConfig and list validation for grouping fields
src/aiperf/plot/core/plot_generator.py	Added semantic coloring, optimal direction indicators, and quadrant shading
src/aiperf/plot/core/data_loader.py	Implemented experiment classification and group extraction logic
src/aiperf/plot/constants.py	Added derived metric direction mappings
src/aiperf/plot/config.py	New PlotConfig class for YAML-based configuration loading
src/aiperf/plot/cli_runner.py	Added config and verbose parameters
src/aiperf/common/enums/metric_enums.py	Added PlotMetricDirection enum
src/aiperf/common/enums/init.py	Exported PlotMetricDirection
src/aiperf/cli_utils.py	Added show_traceback parameter to exit_on_error
src/aiperf/cli.py	Added config and verbose flags to plot command
docs/tutorials/plot.md	Added documentation for experiment classification and grouping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/unit/plot/test_plot_generator.py

src/aiperf/plot/core/plot_generator.py

src/aiperf/plot/config.py

src/aiperf/plot/core/data_loader.py

src/aiperf/plot/default_plot_config.yaml

codecov · 2025-12-01T22:12:40Z

Codecov Report

❌ Patch coverage is 84.13284% with 86 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/plot/config.py	79.55%	20 Missing and 17 partials ⚠️
src/aiperf/plot/core/plot_generator.py	90.55%	9 Missing and 8 partials ⚠️
src/aiperf/plot/exporters/png/multi_run.py	48.38%	12 Missing and 4 partials ⚠️
src/aiperf/plot/plot_controller.py	78.26%	4 Missing and 1 partial ⚠️
src/aiperf/plot/core/data_loader.py	88.23%	2 Missing and 2 partials ⚠️
src/aiperf/cli_utils.py	25.00%	3 Missing ⚠️
src/aiperf/cli.py	0.00%	2 Missing ⚠️
src/aiperf/plot/core/plot_specs.py	91.30%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai · 2025-12-01T22:16:36Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request adds YAML-based plot configuration management, experiment classification (baseline/treatment grouping), and enhanced visualization features including metric directionality indicators and optimal quadrant shading to NVIDIA AIPERF's plotting system.

Changes

Cohort / File(s)	Summary
Documentation & Configuration Defaults `docs/tutorials/plot.md`, `src/aiperf/plot/default_plot_config.yaml`	Introduces comprehensive plotting documentation covering custom plot grouping, experiment classification, and multi-run plot presets. New YAML configuration file defines default plots for both multi-run and single-run analyses with three predefined multi-run plots and six single-run time-series plots.
CLI & Configuration Infrastructure `src/aiperf/cli.py`, `src/aiperf/cli_utils.py`, `src/aiperf/plot/cli_runner.py`, `src/aiperf/plot/config.py`	Adds config and verbose parameters to CLI plot command. Introduces PlotConfig class for YAML-based configuration management with three-tier resolution strategy (custom → user home → defaults). Implements error handling with optional traceback display via exit_on_error enhancements.
Enums & Constants `src/aiperf/common/enums/metric_enums.py`, `src/aiperf/common/enums/__init__.py`, `src/aiperf/plot/constants.py`	Introduces PlotMetricDirection enum (HIGHER/LOWER) for semantic metric directionality and exports it via public API. Adds DERIVED_METRIC_DIRECTIONS mapping for token throughput metrics.
Data Layer & Classification `src/aiperf/plot/core/data_loader.py`	Extends RunMetadata with experiment_type and experiment_group fields. Integrates ExperimentClassificationConfig to classify runs as baselines or treatments via glob pattern matching and extract experiment groups from run paths.
Visualization Core `src/aiperf/plot/core/plot_specs.py`, `src/aiperf/plot/core/plot_generator.py`, `src/aiperf/plot/metric_names.py`	Adds ExperimentClassificationConfig schema. Refactors PlotSpec to accept label_by/group_by as lists with normalization. Removes hardcoded plot specifications. Enhances PlotGenerator with directionality-aware axis labels, optimal quadrant shading, group coloring based on classification, and metric direction utility functions. Adds metric category getter functions.
Exporters `src/aiperf/plot/exporters/png/multi_run.py`, `src/aiperf/plot/exporters/png/single_run.py`	Updates export signatures to accept plot_specs list and optional classification_config. Removes reliance on hardcoded spec constants. Extracts experiment types and group display names from DataFrames for visualization.
Handlers & Logging `src/aiperf/plot/handlers/multi_run_handlers.py`, `src/aiperf/plot/logging.py`, `src/aiperf/plot/plot_controller.py`	Adds helper methods to extract experiment types and display names in multi-run handlers. Introduces console-only logging setup and enhanced plot logging. Integrates PlotConfig into PlotController with classification and plot specs wiring.
Tests `tests/unit/plot/test_*.py`	Adds comprehensive test suites for experiment classification logic, PlotConfig loading and validation, plot controller integration, and exporter functionality. Updates existing tests to accommodate new return values and API signatures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

PlotConfig implementation: Complex YAML loading, validation, metric expansion, and three-tier path resolution logic
Experiment classification system: Pattern matching, group extraction, and metadata enrichment across DataLoader
Plot visualization enhancements: Multiple new utility functions for directionality, quadrant shading, and color mapping logic
API signature changes: Widespread updates to exporter and handler signatures introduce surface area for compatibility issues
Test coverage quality: Extensive tests provide confidence but require careful review for assertion correctness, especially around new three-return-value unpacking in _prepare_groups

Areas requiring extra attention:

Metric expansion logic in PlotConfig (_expand_metric_shortcut and shortcut resolution paths)
Experiment classification pattern matching and group extraction edge cases (concurrency directory handling, path resolution)
PlotGenerator's new directional shading and coloring logic, particularly baseline/treatment color assignment and optimal quadrant overlay implementation
Test assertions validating title composition with directional subtitles and arrow characters
Default vs. explicit configuration precedence and cascade behavior in PlotConfig

Poem

🐰 A config-driven tale unfolds,
With YAML treasures and stories bold,
Baselines in gray, treatments in green,
The finest plots you've ever seen!
Arrows point where metrics gleam—
Experiment classification's dream! ✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 97.48% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title 'feat: AIPerf plot configuration system' accurately and concisely describes the main feature being added—a plot configuration system for AIPerf. It aligns with the primary changes across all modified files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (28)

src/aiperf/plot/logging.py (3)
24-55: Global handler reset in setup_console_only_logging can clobber prior logging config

Implementation is fine for a standalone CLI, but for existing_handler in root_logger.handlers[:] followed by removeHandler will drop any previously configured root handlers. If aiperf.plot is ever used as a library inside a larger process with its own logging, this will override that configuration.

If you want to keep this reusable, consider either:

Operating on a dedicated logger hierarchy instead of the root, or

Only clearing handlers you know you own (e.g., marked by a custom attribute).

61-66: Console verbosity docs vs implementation are slightly out of sync

Docstring says, “Console output shows WARNING+ by default, or all logs when verbose (DEBUG/INFO level).” The current logic:
console_level = level if level == "DEBUG" else "WARNING"
means only log_level=="DEBUG" yields full console output; INFO still shows WARNING+. That’s fine for the current PlotController usage (DEBUG when verbose, INFO otherwise), but the function’s docs suggest INFO could also be “verbose”.

Consider tightening either:

The implementation (e.g., treat INFO as console-level INFO as well), or

The docstring (e.g., say “all logs when log_level=='DEBUG', otherwise WARNING+”).

79-81: Minor: prefer logging level constants instead of strings

console_level is currently a string ("DEBUG"/"WARNING"). logging happily accepts that, but using logging.DEBUG / logging.WARNING would be more idiomatic and avoids surprises if someone ever passes an unexpected string to log_level.

Not urgent, but worth considering for readability and static tooling.

Also applies to: 91-91
src/aiperf/plot/metric_names.py (2)
33-57: Minor style: align list construction with Ruff’s iterable-unpacking suggestion

The constructions:
_AGGREGATED_METRICS: list[str] = MetricRegistry.tags_applicable_to(...) + [
    "output_token_throughput_per_gpu",
]

_REQUEST_METRICS: list[str] = (
    MetricRegistry.tags_applicable_to(...) + [
        "request_number",
        ...
    ]
)
work fine, but Ruff’s RUF005 hint is suggesting iterable unpacking, which is a bit more idiomatic and avoids an intermediate list for concatenation. For example:
_AGGREGATED_METRICS: list[str] = [
    *MetricRegistry.tags_applicable_to(
        MetricFlags.NONE,
        MetricFlags.NONE,
        MetricType.RECORD,
        MetricType.DERIVED,
    ),
    "output_token_throughput_per_gpu",
]

_REQUEST_METRICS: list[str] = [
    *MetricRegistry.tags_applicable_to(
        MetricFlags.NONE,
        MetricFlags.NONE,
        MetricType.RECORD,
    ),
    "request_number",
    "timestamp",
    "timestamp_s",
    "throughput_tokens_per_sec",
    "active_requests",
]
Purely a readability/micro-optimization tweak; behavior is unchanged.

59-69: Guard against accidental mutation of module-level metric lists

All four getters:
def get_aggregated_metrics() -> list[str]:
    return _AGGREGATED_METRICS

def get_request_metrics() -> list[str]:
    return _REQUEST_METRICS

def get_timeslice_metrics() -> list[str]:
    return _TIMESLICE_METRICS

def get_gpu_metrics() -> list[str]:
    return _GPU_METRICS
return references to module-level lists. Any caller doing get_aggregated_metrics().append(...) will mutate the shared global state, which could lead to hard-to-track bugs in plotting/validation.

If you intend these as read-only definitions, consider one of:

Keep internal storage as tuples and return lists:
_AGGREGATED_METRICS: tuple[str, ...] = (...)
# ...
def get_aggregated_metrics() -> list[str]:
    return list(_AGGREGATED_METRICS)
Or keep them as lists but return copies:
def get_gpu_metrics() -> list[str]:
    return list(_GPU_METRICS)
Either way, callers can safely modify the returned list without affecting global configuration.

Also applies to: 110-187
src/aiperf/plot/cli_runner.py (1)
52-54: Logs: line may be misleading when logging falls back to console-only

You always print:
print(f"Logs: {output_dir / PLOT_LOG_FILE}")
But PlotController can fall back to setup_console_only_logging if file logging fails (e.g., permission issues), in which case that log file may not exist.

If you want to avoid confusing users in that scenario, consider:

Having PlotController.run() (or the controller instance) expose whether file logging was successfully configured and only print the log path when it is, or

Adjusting the message, e.g., Logs (if file logging enabled): ... or similar.
src/aiperf/plot/constants.py (1)
63-68: DERIVED_METRIC_DIRECTIONS is a good central hook for non-registry metrics

The new mapping for derived metrics:
DERIVED_METRIC_DIRECTIONS = {
    "output_token_throughput_per_gpu": True,
    "output_token_throughput_per_user": True,
}
plus the explanatory comment about True/False semantics provides a clear, centralized way for plotting code to infer direction for metrics that don’t live in MetricRegistry.

Just keep in mind that any future derived metrics added in data processing should also be added here (and to the display-name layer) to avoid silent “unknown direction” cases in plotting.
tests/unit/plot/test_plot_generator.py (6)
361-367: Prefix unused unpacked variables with underscore.

Static analysis correctly identifies that groups and display_names are unpacked but not used in this test. Per Python convention, prefix unused variables with _.
-        groups, color_map, display_names = plot_generator._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")
382-392: Prefix unused unpacked variables with underscore.

Same issue: groups1, groups2, display_names1, and display_names2 are unpacked but never used.
-        groups1, colors1, display_names1 = plot_generator._prepare_groups(df1, "model")
+        _groups1, colors1, _display_names1 = plot_generator._prepare_groups(df1, "model")

         df2 = pd.DataFrame({"model": ["ModelX", "ModelY", "ModelZ"]})
-        groups2, colors2, display_names2 = plot_generator._prepare_groups(df2, "model")
+        _groups2, colors2, _display_names2 = plot_generator._prepare_groups(df2, "model")
399-409: Prefix unused unpacked variables with underscore.
-        groups, color_map, display_names = plot_generator._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")
1083-1087: Prefix unused unpacked variable with underscore.
-        groups, color_map, display_names = plot_gen._prepare_groups(df, group_by=None)
+        groups, color_map, _display_names = plot_gen._prepare_groups(df, group_by=None)
1096-1106: Prefix unused unpacked variables with underscore.
-        groups, color_map, display_names = plot_gen._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_gen._prepare_groups(df, "model")
1127-1133: Prefix unused unpacked variable with underscore.
-        groups, color_map, display_names = plot_gen._prepare_groups(
+        groups, color_map, _display_names = plot_gen._prepare_groups(
             df, "nonexistent_column"
         )
docs/tutorials/plot.md (1)
223-288: Add language specifier to code block.

The directory structure code block at line 253 should have a language specifier for consistency.
-```
+```text
 artifacts/
 ├── baseline_moderate_io_isl100_osl200_streaming/           # → Grey (baseline; ISL=100, OSL=200)
src/aiperf/plot/core/plot_specs.py (1)
135-160: Consider using TypeError for type validation failures.

Static analysis suggests TypeError is more appropriate than ValueError when the issue is an incorrect type. This aligns with Python conventions where TypeError indicates wrong argument types.
         if isinstance(v, str):
-            raise ValueError(
+            raise TypeError(
                 f"label_by and group_by must be provided as lists of strings, "
                 f"not plain strings. Use [{v!r}] instead of {v!r}"
             )
         if isinstance(v, list):
             return " | ".join(v)
-        raise ValueError(
+        raise TypeError(
             f"label_by and group_by must be lists of strings, got {type(v).__name__}"
         )
src/aiperf/plot/plot_controller.py (1)
61-66: Inconsistent logging: uses print() after setting up logger.

The code sets up a logger but then uses print() for the classification message. For consistency with the logging infrastructure, use the logger instead.
-        if classification_config:
-            print("Experiment classification enabled: grouping runs by baseline/treatment patterns")
+        if classification_config:
+            logger.info("Experiment classification enabled: grouping runs by baseline/treatment patterns")
tests/unit/plot/test_png_exporter.py (1)
45-196: Comprehensive fixture for single-run plot specs.

The sample_plot_specs fixture provides good coverage of various plot types (scatter, area, histogram, dual_axis) with proper metric specifications. The inline import of Style and TimeSlicePlotSpec at line 48 could be moved to the top-level imports for consistency.

Consider moving the import to the top of the file:
 from aiperf.plot.core.plot_specs import (
     DataSource,
     MetricSpec,
     PlotSpec,
     PlotType,
+    Style,
+    TimeSlicePlotSpec,
 )
src/aiperf/plot/exporters/png/multi_run.py (3)
37-43: Missing type hint for classification_config parameter.

The classification_config parameter lacks a type annotation. Based on the codebase context, this should be typed as ExperimentClassificationConfig | None.

As per coding guidelines requiring type hints on all parameters:
+from aiperf.plot.core.plot_specs import ExperimentClassificationConfig, PlotSpec
-from aiperf.plot.core.plot_specs import PlotSpec

     def export(
         self,
         runs: list[RunData],
         available_metrics: dict,
         plot_specs: list[PlotSpec],
-        classification_config=None,
+        classification_config: ExperimentClassificationConfig | None = None,
     ) -> list[Path]:
120-122: available_metrics parameter is unused in _runs_to_dataframe.

The static analysis correctly identifies that available_metrics is passed but not used. Either remove it or document why it's kept for future use.
     def _runs_to_dataframe(
-        self, runs: list[RunData], available_metrics: dict, classification_config=None
+        self, runs: list[RunData], classification_config: ExperimentClassificationConfig | None = None
     ) -> pd.DataFrame:
Then update the call site at line 59:
-        df = self._runs_to_dataframe(runs, available_metrics, classification_config)
+        df = self._runs_to_dataframe(runs, classification_config)
202-233: Potential issue: iloc[0] access on potentially empty group DataFrame.

In _extract_experiment_types, if df[df[group_by] == group_val] results in an empty DataFrame for some group_val, iloc[0] will raise an IndexError. While unlikely given the loop iterates over df[group_by].unique(), defensive coding would be safer.
         experiment_types = {}
         for group_val in df[group_by].unique():
             group_df = df[df[group_by] == group_val]
-            experiment_types[group_val] = group_df["experiment_type"].iloc[0]
+            if not group_df.empty:
+                experiment_types[group_val] = group_df["experiment_type"].iloc[0]
tests/unit/plot/test_plot_config.py (2)
72-106: Redundant environment variable and Path.home monkeypatch.

The test sets HOME env var via monkeyenv (line 77) but also patches Path.home() directly (line 101). The Path.home() patch is sufficient and makes the env var setting unnecessary.
     def test_load_user_config(self, tmp_path, monkeypatch):
         """Test loading from user home config (~/.aiperf/plot_config.yaml)."""
         # Create fake home directory
         fake_home = tmp_path / "home"
         fake_home.mkdir()
-        monkeypatch.setenv("HOME", str(fake_home))

         # Create user config
         user_config_dir = fake_home / ".aiperf"
906-940: Consider using @pytest.mark.parametrize for percentile tests.

The tests for p50, p90, p99 percentiles create separate plot configs but test the same behavior. This is a good candidate for parametrization per coding guidelines.
@pytest.mark.parametrize("stat", ["p50", "p90", "p99"])  # fmt: skip
def test_aggregated_metric_with_percentile(self, tmp_path, stat):
    """Test dynamic resolution with percentiles."""
    config_file = tmp_path / "config.yaml"
    config_file.write_text(f"""
visualization:
  multi_run_defaults:
    - test_plot
  multi_run_plots:
    test_plot:
      type: scatter
      x: request_latency_{stat}
      y: request_throughput_avg
  single_run_defaults: []
  single_run_plots: {{}}
""")
    config = PlotConfig(config_file)
    specs = config.get_multi_run_plot_specs()
    assert specs[0].metrics[0].stat == stat
src/aiperf/plot/config.py (5)
133-136: Blocking file I/O detected.

Per coding guidelines, I/O operations should use async/await. The synchronous open() call here blocks the event loop if this code runs in an async context.

If this module is used in an async context, consider using aiofiles or wrapping in asyncio.to_thread(). However, if this is only called during initialization/startup (which appears to be the case), blocking I/O may be acceptable.

92-92: Consider using logging instead of print.

Direct print() statements bypass the logging infrastructure and cannot be controlled via log levels. This applies to lines 92, 108-110, and 113 as well.
-            print(f"Using config: {self.custom_path}")
+            logger.info(f"Using config: {self.custom_path}")
341-355: Calling get_experiment_classification_config() inside _preset_to_plot_spec is inefficient.

This method is called for every plot preset in a loop. Each call re-parses the same config section repeatedly.

Cache the experiment classification config at the class level during initialization or pass it as a parameter:
 def __init__(self, config_path: Path | None = None, verbose: bool = False) -> None:
     ...
     self.resolved_path = self._resolve_config_path()
     self.config = self._load_yaml()
+    self._experiment_classification_config = self._parse_experiment_classification_config()
Then reference self._experiment_classification_config in _preset_to_plot_spec.

150-153: Broad exception catch re-raises as ValueError, potentially masking the original exception type.

The except Exception clause catches all exceptions including the already-raised ValueError from lines 139-146, wrapping them again. This could result in nested ValueError chains.

Consider catching only non-ValueError exceptions or restructuring the try/except:
-        except Exception as e:
-            raise ValueError(
-                f"Failed to load YAML config from {self.resolved_path}: {e}"
-            ) from e
+        except ValueError:
+            raise  # Re-raise validation errors as-is
+        except Exception as e:
+            raise ValueError(
+                f"Failed to load YAML config from {self.resolved_path}: {e}"
+            ) from e
155-259: Significant code duplication between get_multi_run_plot_specs and get_single_run_plot_specs.

Both methods follow the same pattern: validate list/dict types, iterate defaults, look up presets, call _preset_to_plot_spec, and handle errors identically.

Extract a shared helper method:
def _get_plot_specs(
    self, defaults_key: str, presets_key: str, plot_type_name: str
) -> list[PlotSpec]:
    viz_config = self.config.get("visualization", {})
    defaults = viz_config.get(defaults_key, [])
    if not isinstance(defaults, list):
        raise ValueError(...)
    presets = viz_config.get(presets_key, {})
    if not isinstance(presets, dict):
        raise ValueError(...)
    # ... shared logic
src/aiperf/plot/core/plot_generator.py (1)
394-400: Silent exception swallowing could mask legitimate errors.

The except Exception: pass silently ignores all exceptions from MetricRegistry.get_class(), including potential bugs or misconfigurations beyond "metric not found."

Log at debug level to aid troubleshooting:
         try:
             metric_class = MetricRegistry.get_class(metric_tag)
             if metric_class.has_flags(MetricFlags.LARGER_IS_BETTER):
                 return PlotMetricDirection.HIGHER
             return PlotMetricDirection.LOWER
-        except Exception:
-            pass
+        except KeyError:
+            pass  # Metric not in registry, check derived metrics
+        except Exception as e:
+            logger = logging.getLogger(__name__)
+            logger.debug(f"Error checking metric direction for '{metric_tag}': {e}")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c082d6 and 4ecdd10.

⛔ Files ignored due to path filters (9)

docs/diagrams/plot_examples/multi_run/config_experiment_classification/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/config_experiment_classification/ttft_vs_throughput.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_latency.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_latency.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/theme_dark_mode/ttft_vs_throughput.png is excluded by !**/*.png
docs/diagrams/plot_examples/multi_run/ttft_vs_throughput.png is excluded by !**/*.png
docs/diagrams/plot_examples/single_run/timeslices/timeslices_throughput_warning.png is excluded by !**/*.png

📒 Files selected for processing (24)

docs/tutorials/plot.md (3 hunks)
src/aiperf/cli.py (2 hunks)
src/aiperf/cli_utils.py (3 hunks)
src/aiperf/common/enums/__init__.py (2 hunks)
src/aiperf/common/enums/metric_enums.py (1 hunks)
src/aiperf/plot/cli_runner.py (4 hunks)
src/aiperf/plot/config.py (1 hunks)
src/aiperf/plot/constants.py (1 hunks)
src/aiperf/plot/core/data_loader.py (7 hunks)
src/aiperf/plot/core/plot_generator.py (12 hunks)
src/aiperf/plot/core/plot_specs.py (3 hunks)
src/aiperf/plot/default_plot_config.yaml (1 hunks)
src/aiperf/plot/exporters/png/multi_run.py (6 hunks)
src/aiperf/plot/exporters/png/single_run.py (2 hunks)
src/aiperf/plot/handlers/multi_run_handlers.py (5 hunks)
src/aiperf/plot/logging.py (3 hunks)
src/aiperf/plot/metric_names.py (3 hunks)
src/aiperf/plot/plot_controller.py (5 hunks)
tests/unit/plot/test_cli_runner.py (1 hunks)
tests/unit/plot/test_experiment_classification.py (1 hunks)
tests/unit/plot/test_plot_config.py (1 hunks)
tests/unit/plot/test_plot_controller.py (1 hunks)
tests/unit/plot/test_plot_generator.py (10 hunks)
tests/unit/plot/test_png_exporter.py (28 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use async/await for all I/O operations; never use time.sleep() or blocking calls
Always use orjson for JSON operations: orjson.loads(s) and orjson.dumps(d)
All functions must have type hints on parameters and return types
Use Python 3.10+ union syntax (|) instead of typing.Union; use match/case for pattern matching; use @DataClass(slots=True)

Files:

src/aiperf/plot/constants.py
tests/unit/plot/test_plot_controller.py
src/aiperf/common/enums/__init__.py
tests/unit/plot/test_experiment_classification.py
tests/unit/plot/test_plot_config.py
src/aiperf/plot/logging.py
src/aiperf/plot/metric_names.py
src/aiperf/common/enums/metric_enums.py
src/aiperf/cli_utils.py
src/aiperf/plot/plot_controller.py
src/aiperf/plot/handlers/multi_run_handlers.py
src/aiperf/cli.py
src/aiperf/plot/core/plot_specs.py
src/aiperf/plot/config.py
tests/unit/plot/test_png_exporter.py
src/aiperf/plot/exporters/png/multi_run.py
src/aiperf/plot/core/data_loader.py
tests/unit/plot/test_cli_runner.py
src/aiperf/plot/exporters/png/single_run.py
tests/unit/plot/test_plot_generator.py
src/aiperf/plot/cli_runner.py
src/aiperf/plot/core/plot_generator.py

**/*test*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test files must use pytest with fixtures, helpers, and @pytest.mark.parametrize; import statements at the top; use # fmt: skip for long parameterize blocks

Files:

tests/unit/plot/test_plot_controller.py
tests/unit/plot/test_experiment_classification.py
tests/unit/plot/test_plot_config.py
tests/unit/plot/test_png_exporter.py
tests/unit/plot/test_cli_runner.py
tests/unit/plot/test_plot_generator.py

🧠 Learnings (2)

📚 Learning: 2025-11-25T00:08:56.802Z

Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/config/*.py : Configuration classes must inherit from BaseConfig and include Field(description='...') for all Pydantic fields

Applied to files:

src/aiperf/plot/core/plot_specs.py

📚 Learning: 2025-11-25T00:08:56.802Z

Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/models/*.py : Data models must inherit from AIPerfBaseModel and use Field(description='...') for all Pydantic fields

Applied to files:

src/aiperf/plot/core/plot_specs.py
src/aiperf/plot/core/data_loader.py

🧬 Code graph analysis (15)

src/aiperf/common/enums/__init__.py (1)

src/aiperf/common/enums/metric_enums.py (1)

PlotMetricDirection (417-424)

tests/unit/plot/test_experiment_classification.py (2)

src/aiperf/plot/core/data_loader.py (2)

_classify_experiment_type (785-811)

_extract_experiment_group (813-843)

src/aiperf/plot/core/plot_specs.py (1)

ExperimentClassificationConfig (48-73)

tests/unit/plot/test_plot_config.py (2)

src/aiperf/plot/config.py (4)

PlotConfig (40-448)

get_multi_run_plot_specs (155-206)

get_single_run_plot_specs (208-259)

get_experiment_classification_config (261-289)

src/aiperf/plot/core/plot_specs.py (5)

DataSource (76-82)

MetricSpec (97-109)

PlotSpec (112-173)

PlotType (85-94)

TimeSlicePlotSpec (176-183)

src/aiperf/plot/metric_names.py (1)

src/aiperf/common/enums/metric_enums.py (2)

MetricFlags (612-708)

MetricType (399-414)

src/aiperf/plot/plot_controller.py (6)

src/aiperf/plot/config.py (3)

get_experiment_classification_config (261-289)

get_multi_run_plot_specs (155-206)

get_single_run_plot_specs (208-259)

src/aiperf/plot/core/data_loader.py (1)

DataLoader (153-926)

src/aiperf/plot/core/mode_detector.py (2)

ModeDetector (26-240)

VisualizationMode (19-23)

src/aiperf/plot/exporters/png/multi_run.py (2)

MultiRunPNGExporter (25-233)

export (37-81)

src/aiperf/plot/exporters/png/single_run.py (2)

SingleRunPNGExporter (24-129)

export (39-77)

src/aiperf/plot/logging.py (1)

setup_plot_logging (56-108)

src/aiperf/plot/handlers/multi_run_handlers.py (1)

src/aiperf/plot/exporters/png/multi_run.py (1)

_extract_experiment_types (202-233)

src/aiperf/cli.py (2)

src/aiperf/cli_utils.py (1)

exit_on_error (62-125)

src/aiperf/plot/cli_runner.py (1)

run_plot_controller (11-54)

src/aiperf/plot/config.py (2)

src/aiperf/plot/core/plot_specs.py (7)

DataSource (76-82)

ExperimentClassificationConfig (48-73)

MetricSpec (97-109)

PlotSpec (112-173)

PlotType (85-94)

Style (15-45)

TimeSlicePlotSpec (176-183)

src/aiperf/plot/metric_names.py (3)

get_aggregated_metrics (110-127)

get_request_metrics (130-149)

get_timeslice_metrics (152-168)

tests/unit/plot/test_png_exporter.py (3)

src/aiperf/plot/core/plot_specs.py (6)

DataSource (76-82)

MetricSpec (97-109)

PlotSpec (112-173)

PlotType (85-94)

Style (15-45)

TimeSlicePlotSpec (176-183)

src/aiperf/plot/exporters/png/multi_run.py (2)

export (37-81)

MultiRunPNGExporter (25-233)

src/aiperf/plot/exporters/png/single_run.py (1)

export (39-77)

src/aiperf/plot/core/data_loader.py (2)

src/aiperf/plot/core/plot_specs.py (1)

ExperimentClassificationConfig (48-73)

src/aiperf/common/protocols.py (1)

warning (71-71)

tests/unit/plot/test_cli_runner.py (1)

src/aiperf/plot/constants.py (2)

PlotMode (32-35)

PlotTheme (38-42)

src/aiperf/plot/exporters/png/single_run.py (3)

src/aiperf/plot/exporters/png/multi_run.py (1)

export (37-81)

src/aiperf/plot/exporters/base.py (1)

export (50-57)

src/aiperf/plot/core/plot_specs.py (1)

PlotSpec (112-173)

tests/unit/plot/test_plot_generator.py (1)

src/aiperf/plot/core/plot_generator.py (2)

_prepare_groups (271-366)

detect_directional_outliers (72-112)

src/aiperf/plot/cli_runner.py (3)

src/aiperf/cli.py (1)

plot (39-77)

src/aiperf/plot/constants.py (2)

PlotMode (32-35)

PlotTheme (38-42)

src/aiperf/plot/plot_controller.py (2)

PlotController (18-168)

run (68-79)

src/aiperf/plot/core/plot_generator.py (2)

src/aiperf/common/enums/metric_enums.py (3)

PlotMetricDirection (417-424)

MetricFlags (612-708)

has_flags (691-694)

src/aiperf/plot/metric_names.py (1)

get_metric_display_name (91-107)

🪛 markdownlint-cli2 (0.18.1)

docs/tutorials/plot.md

253-253: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

388-388: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🪛 Ruff (0.14.7)

src/aiperf/plot/metric_names.py

34-42: Consider iterable unpacking instead of concatenation

(RUF005)

45-56: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

src/aiperf/plot/plot_controller.py

52-52: Do not catch blind exception: Exception

(BLE001)

142-142: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/plot/core/plot_specs.py

152-155: Prefer TypeError exception for invalid type

(TRY004)

152-155: Avoid specifying long messages outside the exception class

(TRY003)

158-160: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/plot/config.py

89-91: Avoid specifying long messages outside the exception class

(TRY003)

100-103: Avoid specifying long messages outside the exception class

(TRY003)

129-131: Avoid specifying long messages outside the exception class

(TRY003)

139-141: Prefer TypeError exception for invalid type

(TRY004)

139-141: Abstract raise to an inner function

(TRY301)

139-141: Avoid specifying long messages outside the exception class

(TRY003)

144-146: Abstract raise to an inner function

(TRY301)

144-146: Avoid specifying long messages outside the exception class

(TRY003)

148-148: Consider moving this statement to an else block

(TRY300)

151-153: Avoid specifying long messages outside the exception class

(TRY003)

169-172: Prefer TypeError exception for invalid type

(TRY004)

169-172: Avoid specifying long messages outside the exception class

(TRY003)

176-179: Prefer TypeError exception for invalid type

(TRY004)

176-179: Avoid specifying long messages outside the exception class

(TRY003)

185-187: Abstract raise to an inner function

(TRY301)

185-187: Avoid specifying long messages outside the exception class

(TRY003)

201-204: Avoid specifying long messages outside the exception class

(TRY003)

222-225: Prefer TypeError exception for invalid type

(TRY004)

222-225: Avoid specifying long messages outside the exception class

(TRY003)

229-232: Prefer TypeError exception for invalid type

(TRY004)

229-232: Avoid specifying long messages outside the exception class

(TRY003)

238-240: Abstract raise to an inner function

(TRY301)

238-240: Avoid specifying long messages outside the exception class

(TRY003)

254-257: Avoid specifying long messages outside the exception class

(TRY003)

279-282: Prefer TypeError exception for invalid type

(TRY004)

279-282: Avoid specifying long messages outside the exception class

(TRY003)

287-289: Avoid specifying long messages outside the exception class

(TRY003)

308-310: Prefer TypeError exception for invalid type

(TRY004)

308-310: Avoid specifying long messages outside the exception class

(TRY003)

314-314: Avoid specifying long messages outside the exception class

(TRY003)

339-339: Avoid specifying long messages outside the exception class

(TRY003)

415-418: Avoid specifying long messages outside the exception class

(TRY003)

438-441: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/plot/test_png_exporter.py

852-852: Unused method argument: sample_plot_specs

(ARG002)

946-946: Unused method argument: sample_plot_specs

(ARG002)

src/aiperf/plot/exporters/png/multi_run.py

121-121: Unused method argument: available_metrics

(ARG002)

tests/unit/plot/test_plot_generator.py

361-361: Unpacked variable groups is never used