Skip to content

Conversation

@ilana-n
Copy link
Contributor

@ilana-n ilana-n commented Dec 1, 2025

Plot Configuration & Visualization Enhancements

Summary

This PR builds on top of #459 to introduce configurable YAML-based plot generation and experiment classification with default coloring.

Key Additions

Plot Generation System

  • Optimal direction indicators: Automatic axis labels with ↑↓ indicators and optimal quadrant shading

Configurable YAML System

  • default_plot_config.yaml: Centralized configuration for all plot types
  • Experiment classification: Default color assignment for baseline/treatment comparison
    • Groups by experiment_group (directory name) to preserve treatment variants
    • Baselines: Grey shades, listed first in legend
    • Treatments: NVIDIA green for first, distinct colors for remaining
    • Automatic override of groups: settings when classification enabled
  • Smart default grouping:
    • Without classification: Defaults to run_name (each run is a separate line)
    • With classification: Forced to experiment_group for semantic coloring
  • User customization: Auto-creates ~/.aiperf/plot_config.yaml for user to modify
  • Verbose mode: --verbose flag for detailed logging with error tracebacks
  • Improved logging: Logs saved to output directory alongside plots

Example Plots:
pareto_curve_throughput_per_gpu_vs_interactivity

ttft_vs_throughput

Summary by CodeRabbit

Release Notes

  • New Features

    • Added experiment classification to organize runs as baselines or treatments with semantic coloring
    • Introduced configuration-driven plotting system with custom plot grouping and multi-run/single-run presets
    • Added metric direction indicators (↑/↓) and optimal quadrant shading for scatter/Pareto plots
    • Enhanced group display customization and experiment group extraction for clearer visualizations
    • Added verbose mode for detailed error reporting
  • Documentation

    • New tutorial sections on customizing plot grouping and experiment classification for multi-run analysis

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yaml

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yaml

@github-actions github-actions bot added the feat label Dec 1, 2025
@ilana-n ilana-n requested a review from Copilot December 1, 2025 22:09
@ilana-n
Copy link
Contributor Author

ilana-n commented Dec 1, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 1, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive experiment classification and plot configuration system for AIPerf, enabling users to customize visualizations and distinguish between baseline and treatment runs with semantic coloring.

Key changes:

  • Added YAML-based plot configuration system with auto-creation of user config on first run
  • Implemented experiment classification to categorize runs as baseline/treatment with semantic colors
  • Enhanced multi-run plots with optimal direction indicators and quadrant shading

Reviewed changes

Copilot reviewed 24 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/plot/test_png_exporter.py Updated tests to use config-driven plot specs instead of hardcoded constants
tests/unit/plot/test_plot_generator.py Added tests for experiment group coloring and display name handling
tests/unit/plot/test_plot_controller.py Updated warning message assertion for failed run loading
tests/unit/plot/test_plot_config.py New comprehensive test suite for YAML config loading and validation
tests/unit/plot/test_experiment_classification.py New test suite for baseline/treatment classification logic
tests/unit/plot/test_cli_runner.py Updated test to include new verbose and config_path parameters
src/aiperf/plot/plot_controller.py Added config and classification support to controller
src/aiperf/plot/metric_names.py Added helper functions to retrieve metrics by data source
src/aiperf/plot/logging.py Enhanced logging with console-only fallback and verbose mode support
src/aiperf/plot/handlers/multi_run_handlers.py Added experiment type and display name extraction for plots
src/aiperf/plot/exporters/png/single_run.py Updated to accept plot specs as parameter
src/aiperf/plot/exporters/png/multi_run.py Added classification config support and experiment group handling
src/aiperf/plot/default_plot_config.yaml New default YAML configuration file with plot presets
src/aiperf/plot/core/plot_specs.py Added ExperimentClassificationConfig and list validation for grouping fields
src/aiperf/plot/core/plot_generator.py Added semantic coloring, optimal direction indicators, and quadrant shading
src/aiperf/plot/core/data_loader.py Implemented experiment classification and group extraction logic
src/aiperf/plot/constants.py Added derived metric direction mappings
src/aiperf/plot/config.py New PlotConfig class for YAML-based configuration loading
src/aiperf/plot/cli_runner.py Added config and verbose parameters
src/aiperf/common/enums/metric_enums.py Added PlotMetricDirection enum
src/aiperf/common/enums/init.py Exported PlotMetricDirection
src/aiperf/cli_utils.py Added show_traceback parameter to exit_on_error
src/aiperf/cli.py Added config and verbose flags to plot command
docs/tutorials/plot.md Added documentation for experiment classification and grouping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link

coderabbitai bot commented Dec 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request adds YAML-based plot configuration management, experiment classification (baseline/treatment grouping), and enhanced visualization features including metric directionality indicators and optimal quadrant shading to NVIDIA AIPERF's plotting system.

Changes

Cohort / File(s) Summary
Documentation & Configuration Defaults
docs/tutorials/plot.md, src/aiperf/plot/default_plot_config.yaml
Introduces comprehensive plotting documentation covering custom plot grouping, experiment classification, and multi-run plot presets. New YAML configuration file defines default plots for both multi-run and single-run analyses with three predefined multi-run plots and six single-run time-series plots.
CLI & Configuration Infrastructure
src/aiperf/cli.py, src/aiperf/cli_utils.py, src/aiperf/plot/cli_runner.py, src/aiperf/plot/config.py
Adds config and verbose parameters to CLI plot command. Introduces PlotConfig class for YAML-based configuration management with three-tier resolution strategy (custom → user home → defaults). Implements error handling with optional traceback display via exit_on_error enhancements.
Enums & Constants
src/aiperf/common/enums/metric_enums.py, src/aiperf/common/enums/__init__.py, src/aiperf/plot/constants.py
Introduces PlotMetricDirection enum (HIGHER/LOWER) for semantic metric directionality and exports it via public API. Adds DERIVED_METRIC_DIRECTIONS mapping for token throughput metrics.
Data Layer & Classification
src/aiperf/plot/core/data_loader.py
Extends RunMetadata with experiment_type and experiment_group fields. Integrates ExperimentClassificationConfig to classify runs as baselines or treatments via glob pattern matching and extract experiment groups from run paths.
Visualization Core
src/aiperf/plot/core/plot_specs.py, src/aiperf/plot/core/plot_generator.py, src/aiperf/plot/metric_names.py
Adds ExperimentClassificationConfig schema. Refactors PlotSpec to accept label_by/group_by as lists with normalization. Removes hardcoded plot specifications. Enhances PlotGenerator with directionality-aware axis labels, optimal quadrant shading, group coloring based on classification, and metric direction utility functions. Adds metric category getter functions.
Exporters
src/aiperf/plot/exporters/png/multi_run.py, src/aiperf/plot/exporters/png/single_run.py
Updates export signatures to accept plot_specs list and optional classification_config. Removes reliance on hardcoded spec constants. Extracts experiment types and group display names from DataFrames for visualization.
Handlers & Logging
src/aiperf/plot/handlers/multi_run_handlers.py, src/aiperf/plot/logging.py, src/aiperf/plot/plot_controller.py
Adds helper methods to extract experiment types and display names in multi-run handlers. Introduces console-only logging setup and enhanced plot logging. Integrates PlotConfig into PlotController with classification and plot specs wiring.
Tests
tests/unit/plot/test_*.py
Adds comprehensive test suites for experiment classification logic, PlotConfig loading and validation, plot controller integration, and exporter functionality. Updates existing tests to accommodate new return values and API signatures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

  • PlotConfig implementation: Complex YAML loading, validation, metric expansion, and three-tier path resolution logic
  • Experiment classification system: Pattern matching, group extraction, and metadata enrichment across DataLoader
  • Plot visualization enhancements: Multiple new utility functions for directionality, quadrant shading, and color mapping logic
  • API signature changes: Widespread updates to exporter and handler signatures introduce surface area for compatibility issues
  • Test coverage quality: Extensive tests provide confidence but require careful review for assertion correctness, especially around new three-return-value unpacking in _prepare_groups

Areas requiring extra attention:

  • Metric expansion logic in PlotConfig (_expand_metric_shortcut and shortcut resolution paths)
  • Experiment classification pattern matching and group extraction edge cases (concurrency directory handling, path resolution)
  • PlotGenerator's new directional shading and coloring logic, particularly baseline/treatment color assignment and optimal quadrant overlay implementation
  • Test assertions validating title composition with directional subtitles and arrow characters
  • Default vs. explicit configuration precedence and cascade behavior in PlotConfig

Poem

🐰 A config-driven tale unfolds,
With YAML treasures and stories bold,
Baselines in gray, treatments in green,
The finest plots you've ever seen!
Arrows point where metrics gleam—
Experiment classification's dream!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 97.48% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'feat: AIPerf plot configuration system' accurately and concisely describes the main feature being added—a plot configuration system for AIPerf. It aligns with the primary changes across all modified files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (28)
src/aiperf/plot/logging.py (3)

24-55: Global handler reset in setup_console_only_logging can clobber prior logging config

Implementation is fine for a standalone CLI, but for existing_handler in root_logger.handlers[:] followed by removeHandler will drop any previously configured root handlers. If aiperf.plot is ever used as a library inside a larger process with its own logging, this will override that configuration.

If you want to keep this reusable, consider either:

  • Operating on a dedicated logger hierarchy instead of the root, or
  • Only clearing handlers you know you own (e.g., marked by a custom attribute).

61-66: Console verbosity docs vs implementation are slightly out of sync

Docstring says, “Console output shows WARNING+ by default, or all logs when verbose (DEBUG/INFO level).” The current logic:

console_level = level if level == "DEBUG" else "WARNING"

means only log_level=="DEBUG" yields full console output; INFO still shows WARNING+. That’s fine for the current PlotController usage (DEBUG when verbose, INFO otherwise), but the function’s docs suggest INFO could also be “verbose”.

Consider tightening either:

  • The implementation (e.g., treat INFO as console-level INFO as well), or
  • The docstring (e.g., say “all logs when log_level=='DEBUG', otherwise WARNING+”).

79-81: Minor: prefer logging level constants instead of strings

console_level is currently a string ("DEBUG"/"WARNING"). logging happily accepts that, but using logging.DEBUG / logging.WARNING would be more idiomatic and avoids surprises if someone ever passes an unexpected string to log_level.

Not urgent, but worth considering for readability and static tooling.

Also applies to: 91-91

src/aiperf/plot/metric_names.py (2)

33-57: Minor style: align list construction with Ruff’s iterable-unpacking suggestion

The constructions:

_AGGREGATED_METRICS: list[str] = MetricRegistry.tags_applicable_to(...) + [
    "output_token_throughput_per_gpu",
]

_REQUEST_METRICS: list[str] = (
    MetricRegistry.tags_applicable_to(...) + [
        "request_number",
        ...
    ]
)

work fine, but Ruff’s RUF005 hint is suggesting iterable unpacking, which is a bit more idiomatic and avoids an intermediate list for concatenation. For example:

_AGGREGATED_METRICS: list[str] = [
    *MetricRegistry.tags_applicable_to(
        MetricFlags.NONE,
        MetricFlags.NONE,
        MetricType.RECORD,
        MetricType.DERIVED,
    ),
    "output_token_throughput_per_gpu",
]

_REQUEST_METRICS: list[str] = [
    *MetricRegistry.tags_applicable_to(
        MetricFlags.NONE,
        MetricFlags.NONE,
        MetricType.RECORD,
    ),
    "request_number",
    "timestamp",
    "timestamp_s",
    "throughput_tokens_per_sec",
    "active_requests",
]

Purely a readability/micro-optimization tweak; behavior is unchanged.


59-69: Guard against accidental mutation of module-level metric lists

All four getters:

def get_aggregated_metrics() -> list[str]:
    return _AGGREGATED_METRICS

def get_request_metrics() -> list[str]:
    return _REQUEST_METRICS

def get_timeslice_metrics() -> list[str]:
    return _TIMESLICE_METRICS

def get_gpu_metrics() -> list[str]:
    return _GPU_METRICS

return references to module-level lists. Any caller doing get_aggregated_metrics().append(...) will mutate the shared global state, which could lead to hard-to-track bugs in plotting/validation.

If you intend these as read-only definitions, consider one of:

  • Keep internal storage as tuples and return lists:
_AGGREGATED_METRICS: tuple[str, ...] = (...)
# ...
def get_aggregated_metrics() -> list[str]:
    return list(_AGGREGATED_METRICS)
  • Or keep them as lists but return copies:
def get_gpu_metrics() -> list[str]:
    return list(_GPU_METRICS)

Either way, callers can safely modify the returned list without affecting global configuration.

Also applies to: 110-187

src/aiperf/plot/cli_runner.py (1)

52-54: Logs: line may be misleading when logging falls back to console-only

You always print:

print(f"Logs: {output_dir / PLOT_LOG_FILE}")

But PlotController can fall back to setup_console_only_logging if file logging fails (e.g., permission issues), in which case that log file may not exist.

If you want to avoid confusing users in that scenario, consider:

  • Having PlotController.run() (or the controller instance) expose whether file logging was successfully configured and only print the log path when it is, or
  • Adjusting the message, e.g., Logs (if file logging enabled): ... or similar.
src/aiperf/plot/constants.py (1)

63-68: DERIVED_METRIC_DIRECTIONS is a good central hook for non-registry metrics

The new mapping for derived metrics:

DERIVED_METRIC_DIRECTIONS = {
    "output_token_throughput_per_gpu": True,
    "output_token_throughput_per_user": True,
}

plus the explanatory comment about True/False semantics provides a clear, centralized way for plotting code to infer direction for metrics that don’t live in MetricRegistry.

Just keep in mind that any future derived metrics added in data processing should also be added here (and to the display-name layer) to avoid silent “unknown direction” cases in plotting.

tests/unit/plot/test_plot_generator.py (6)

361-367: Prefix unused unpacked variables with underscore.

Static analysis correctly identifies that groups and display_names are unpacked but not used in this test. Per Python convention, prefix unused variables with _.

-        groups, color_map, display_names = plot_generator._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")

382-392: Prefix unused unpacked variables with underscore.

Same issue: groups1, groups2, display_names1, and display_names2 are unpacked but never used.

-        groups1, colors1, display_names1 = plot_generator._prepare_groups(df1, "model")
+        _groups1, colors1, _display_names1 = plot_generator._prepare_groups(df1, "model")

         df2 = pd.DataFrame({"model": ["ModelX", "ModelY", "ModelZ"]})
-        groups2, colors2, display_names2 = plot_generator._prepare_groups(df2, "model")
+        _groups2, colors2, _display_names2 = plot_generator._prepare_groups(df2, "model")

399-409: Prefix unused unpacked variables with underscore.

-        groups, color_map, display_names = plot_generator._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")

1083-1087: Prefix unused unpacked variable with underscore.

-        groups, color_map, display_names = plot_gen._prepare_groups(df, group_by=None)
+        groups, color_map, _display_names = plot_gen._prepare_groups(df, group_by=None)

1096-1106: Prefix unused unpacked variables with underscore.

-        groups, color_map, display_names = plot_gen._prepare_groups(df, "model")
+        _groups, color_map, _display_names = plot_gen._prepare_groups(df, "model")

1127-1133: Prefix unused unpacked variable with underscore.

-        groups, color_map, display_names = plot_gen._prepare_groups(
+        groups, color_map, _display_names = plot_gen._prepare_groups(
             df, "nonexistent_column"
         )
docs/tutorials/plot.md (1)

223-288: Add language specifier to code block.

The directory structure code block at line 253 should have a language specifier for consistency.

-```
+```text
 artifacts/
 ├── baseline_moderate_io_isl100_osl200_streaming/           # → Grey (baseline; ISL=100, OSL=200)
src/aiperf/plot/core/plot_specs.py (1)

135-160: Consider using TypeError for type validation failures.

Static analysis suggests TypeError is more appropriate than ValueError when the issue is an incorrect type. This aligns with Python conventions where TypeError indicates wrong argument types.

         if isinstance(v, str):
-            raise ValueError(
+            raise TypeError(
                 f"label_by and group_by must be provided as lists of strings, "
                 f"not plain strings. Use [{v!r}] instead of {v!r}"
             )
         if isinstance(v, list):
             return " | ".join(v)
-        raise ValueError(
+        raise TypeError(
             f"label_by and group_by must be lists of strings, got {type(v).__name__}"
         )
src/aiperf/plot/plot_controller.py (1)

61-66: Inconsistent logging: uses print() after setting up logger.

The code sets up a logger but then uses print() for the classification message. For consistency with the logging infrastructure, use the logger instead.

-        if classification_config:
-            print("Experiment classification enabled: grouping runs by baseline/treatment patterns")
+        if classification_config:
+            logger.info("Experiment classification enabled: grouping runs by baseline/treatment patterns")
tests/unit/plot/test_png_exporter.py (1)

45-196: Comprehensive fixture for single-run plot specs.

The sample_plot_specs fixture provides good coverage of various plot types (scatter, area, histogram, dual_axis) with proper metric specifications. The inline import of Style and TimeSlicePlotSpec at line 48 could be moved to the top-level imports for consistency.

Consider moving the import to the top of the file:

 from aiperf.plot.core.plot_specs import (
     DataSource,
     MetricSpec,
     PlotSpec,
     PlotType,
+    Style,
+    TimeSlicePlotSpec,
 )
src/aiperf/plot/exporters/png/multi_run.py (3)

37-43: Missing type hint for classification_config parameter.

The classification_config parameter lacks a type annotation. Based on the codebase context, this should be typed as ExperimentClassificationConfig | None.

As per coding guidelines requiring type hints on all parameters:

+from aiperf.plot.core.plot_specs import ExperimentClassificationConfig, PlotSpec
-from aiperf.plot.core.plot_specs import PlotSpec

     def export(
         self,
         runs: list[RunData],
         available_metrics: dict,
         plot_specs: list[PlotSpec],
-        classification_config=None,
+        classification_config: ExperimentClassificationConfig | None = None,
     ) -> list[Path]:

120-122: available_metrics parameter is unused in _runs_to_dataframe.

The static analysis correctly identifies that available_metrics is passed but not used. Either remove it or document why it's kept for future use.

     def _runs_to_dataframe(
-        self, runs: list[RunData], available_metrics: dict, classification_config=None
+        self, runs: list[RunData], classification_config: ExperimentClassificationConfig | None = None
     ) -> pd.DataFrame:

Then update the call site at line 59:

-        df = self._runs_to_dataframe(runs, available_metrics, classification_config)
+        df = self._runs_to_dataframe(runs, classification_config)

202-233: Potential issue: iloc[0] access on potentially empty group DataFrame.

In _extract_experiment_types, if df[df[group_by] == group_val] results in an empty DataFrame for some group_val, iloc[0] will raise an IndexError. While unlikely given the loop iterates over df[group_by].unique(), defensive coding would be safer.

         experiment_types = {}
         for group_val in df[group_by].unique():
             group_df = df[df[group_by] == group_val]
-            experiment_types[group_val] = group_df["experiment_type"].iloc[0]
+            if not group_df.empty:
+                experiment_types[group_val] = group_df["experiment_type"].iloc[0]
tests/unit/plot/test_plot_config.py (2)

72-106: Redundant environment variable and Path.home monkeypatch.

The test sets HOME env var via monkeyenv (line 77) but also patches Path.home() directly (line 101). The Path.home() patch is sufficient and makes the env var setting unnecessary.

     def test_load_user_config(self, tmp_path, monkeypatch):
         """Test loading from user home config (~/.aiperf/plot_config.yaml)."""
         # Create fake home directory
         fake_home = tmp_path / "home"
         fake_home.mkdir()
-        monkeypatch.setenv("HOME", str(fake_home))

         # Create user config
         user_config_dir = fake_home / ".aiperf"

906-940: Consider using @pytest.mark.parametrize for percentile tests.

The tests for p50, p90, p99 percentiles create separate plot configs but test the same behavior. This is a good candidate for parametrization per coding guidelines.

@pytest.mark.parametrize("stat", ["p50", "p90", "p99"])  # fmt: skip
def test_aggregated_metric_with_percentile(self, tmp_path, stat):
    """Test dynamic resolution with percentiles."""
    config_file = tmp_path / "config.yaml"
    config_file.write_text(f"""
visualization:
  multi_run_defaults:
    - test_plot
  multi_run_plots:
    test_plot:
      type: scatter
      x: request_latency_{stat}
      y: request_throughput_avg
  single_run_defaults: []
  single_run_plots: {{}}
""")
    config = PlotConfig(config_file)
    specs = config.get_multi_run_plot_specs()
    assert specs[0].metrics[0].stat == stat
src/aiperf/plot/config.py (5)

133-136: Blocking file I/O detected.

Per coding guidelines, I/O operations should use async/await. The synchronous open() call here blocks the event loop if this code runs in an async context.

If this module is used in an async context, consider using aiofiles or wrapping in asyncio.to_thread(). However, if this is only called during initialization/startup (which appears to be the case), blocking I/O may be acceptable.


92-92: Consider using logging instead of print.

Direct print() statements bypass the logging infrastructure and cannot be controlled via log levels. This applies to lines 92, 108-110, and 113 as well.

-            print(f"Using config: {self.custom_path}")
+            logger.info(f"Using config: {self.custom_path}")

341-355: Calling get_experiment_classification_config() inside _preset_to_plot_spec is inefficient.

This method is called for every plot preset in a loop. Each call re-parses the same config section repeatedly.

Cache the experiment classification config at the class level during initialization or pass it as a parameter:

 def __init__(self, config_path: Path | None = None, verbose: bool = False) -> None:
     ...
     self.resolved_path = self._resolve_config_path()
     self.config = self._load_yaml()
+    self._experiment_classification_config = self._parse_experiment_classification_config()

Then reference self._experiment_classification_config in _preset_to_plot_spec.


150-153: Broad exception catch re-raises as ValueError, potentially masking the original exception type.

The except Exception clause catches all exceptions including the already-raised ValueError from lines 139-146, wrapping them again. This could result in nested ValueError chains.

Consider catching only non-ValueError exceptions or restructuring the try/except:

-        except Exception as e:
-            raise ValueError(
-                f"Failed to load YAML config from {self.resolved_path}: {e}"
-            ) from e
+        except ValueError:
+            raise  # Re-raise validation errors as-is
+        except Exception as e:
+            raise ValueError(
+                f"Failed to load YAML config from {self.resolved_path}: {e}"
+            ) from e

155-259: Significant code duplication between get_multi_run_plot_specs and get_single_run_plot_specs.

Both methods follow the same pattern: validate list/dict types, iterate defaults, look up presets, call _preset_to_plot_spec, and handle errors identically.

Extract a shared helper method:

def _get_plot_specs(
    self, defaults_key: str, presets_key: str, plot_type_name: str
) -> list[PlotSpec]:
    viz_config = self.config.get("visualization", {})
    defaults = viz_config.get(defaults_key, [])
    if not isinstance(defaults, list):
        raise ValueError(...)
    presets = viz_config.get(presets_key, {})
    if not isinstance(presets, dict):
        raise ValueError(...)
    # ... shared logic
src/aiperf/plot/core/plot_generator.py (1)

394-400: Silent exception swallowing could mask legitimate errors.

The except Exception: pass silently ignores all exceptions from MetricRegistry.get_class(), including potential bugs or misconfigurations beyond "metric not found."

Log at debug level to aid troubleshooting:

         try:
             metric_class = MetricRegistry.get_class(metric_tag)
             if metric_class.has_flags(MetricFlags.LARGER_IS_BETTER):
                 return PlotMetricDirection.HIGHER
             return PlotMetricDirection.LOWER
-        except Exception:
-            pass
+        except KeyError:
+            pass  # Metric not in registry, check derived metrics
+        except Exception as e:
+            logger = logging.getLogger(__name__)
+            logger.debug(f"Error checking metric direction for '{metric_tag}': {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c082d6 and 4ecdd10.

⛔ Files ignored due to path filters (9)
  • docs/diagrams/plot_examples/multi_run/config_experiment_classification/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/config_experiment_classification/ttft_vs_throughput.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_latency.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_interactivity.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_latency.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/theme_dark_mode/ttft_vs_throughput.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/multi_run/ttft_vs_throughput.png is excluded by !**/*.png
  • docs/diagrams/plot_examples/single_run/timeslices/timeslices_throughput_warning.png is excluded by !**/*.png
📒 Files selected for processing (24)
  • docs/tutorials/plot.md (3 hunks)
  • src/aiperf/cli.py (2 hunks)
  • src/aiperf/cli_utils.py (3 hunks)
  • src/aiperf/common/enums/__init__.py (2 hunks)
  • src/aiperf/common/enums/metric_enums.py (1 hunks)
  • src/aiperf/plot/cli_runner.py (4 hunks)
  • src/aiperf/plot/config.py (1 hunks)
  • src/aiperf/plot/constants.py (1 hunks)
  • src/aiperf/plot/core/data_loader.py (7 hunks)
  • src/aiperf/plot/core/plot_generator.py (12 hunks)
  • src/aiperf/plot/core/plot_specs.py (3 hunks)
  • src/aiperf/plot/default_plot_config.yaml (1 hunks)
  • src/aiperf/plot/exporters/png/multi_run.py (6 hunks)
  • src/aiperf/plot/exporters/png/single_run.py (2 hunks)
  • src/aiperf/plot/handlers/multi_run_handlers.py (5 hunks)
  • src/aiperf/plot/logging.py (3 hunks)
  • src/aiperf/plot/metric_names.py (3 hunks)
  • src/aiperf/plot/plot_controller.py (5 hunks)
  • tests/unit/plot/test_cli_runner.py (1 hunks)
  • tests/unit/plot/test_experiment_classification.py (1 hunks)
  • tests/unit/plot/test_plot_config.py (1 hunks)
  • tests/unit/plot/test_plot_controller.py (1 hunks)
  • tests/unit/plot/test_plot_generator.py (10 hunks)
  • tests/unit/plot/test_png_exporter.py (28 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use async/await for all I/O operations; never use time.sleep() or blocking calls
Always use orjson for JSON operations: orjson.loads(s) and orjson.dumps(d)
All functions must have type hints on parameters and return types
Use Python 3.10+ union syntax (|) instead of typing.Union; use match/case for pattern matching; use @DataClass(slots=True)

Files:

  • src/aiperf/plot/constants.py
  • tests/unit/plot/test_plot_controller.py
  • src/aiperf/common/enums/__init__.py
  • tests/unit/plot/test_experiment_classification.py
  • tests/unit/plot/test_plot_config.py
  • src/aiperf/plot/logging.py
  • src/aiperf/plot/metric_names.py
  • src/aiperf/common/enums/metric_enums.py
  • src/aiperf/cli_utils.py
  • src/aiperf/plot/plot_controller.py
  • src/aiperf/plot/handlers/multi_run_handlers.py
  • src/aiperf/cli.py
  • src/aiperf/plot/core/plot_specs.py
  • src/aiperf/plot/config.py
  • tests/unit/plot/test_png_exporter.py
  • src/aiperf/plot/exporters/png/multi_run.py
  • src/aiperf/plot/core/data_loader.py
  • tests/unit/plot/test_cli_runner.py
  • src/aiperf/plot/exporters/png/single_run.py
  • tests/unit/plot/test_plot_generator.py
  • src/aiperf/plot/cli_runner.py
  • src/aiperf/plot/core/plot_generator.py
**/*test*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test files must use pytest with fixtures, helpers, and @pytest.mark.parametrize; import statements at the top; use # fmt: skip for long parameterize blocks

Files:

  • tests/unit/plot/test_plot_controller.py
  • tests/unit/plot/test_experiment_classification.py
  • tests/unit/plot/test_plot_config.py
  • tests/unit/plot/test_png_exporter.py
  • tests/unit/plot/test_cli_runner.py
  • tests/unit/plot/test_plot_generator.py
🧠 Learnings (2)
📚 Learning: 2025-11-25T00:08:56.802Z
Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/config/*.py : Configuration classes must inherit from BaseConfig and include Field(description='...') for all Pydantic fields

Applied to files:

  • src/aiperf/plot/core/plot_specs.py
📚 Learning: 2025-11-25T00:08:56.802Z
Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/models/*.py : Data models must inherit from AIPerfBaseModel and use Field(description='...') for all Pydantic fields

Applied to files:

  • src/aiperf/plot/core/plot_specs.py
  • src/aiperf/plot/core/data_loader.py
🧬 Code graph analysis (15)
src/aiperf/common/enums/__init__.py (1)
src/aiperf/common/enums/metric_enums.py (1)
  • PlotMetricDirection (417-424)
tests/unit/plot/test_experiment_classification.py (2)
src/aiperf/plot/core/data_loader.py (2)
  • _classify_experiment_type (785-811)
  • _extract_experiment_group (813-843)
src/aiperf/plot/core/plot_specs.py (1)
  • ExperimentClassificationConfig (48-73)
tests/unit/plot/test_plot_config.py (2)
src/aiperf/plot/config.py (4)
  • PlotConfig (40-448)
  • get_multi_run_plot_specs (155-206)
  • get_single_run_plot_specs (208-259)
  • get_experiment_classification_config (261-289)
src/aiperf/plot/core/plot_specs.py (5)
  • DataSource (76-82)
  • MetricSpec (97-109)
  • PlotSpec (112-173)
  • PlotType (85-94)
  • TimeSlicePlotSpec (176-183)
src/aiperf/plot/metric_names.py (1)
src/aiperf/common/enums/metric_enums.py (2)
  • MetricFlags (612-708)
  • MetricType (399-414)
src/aiperf/plot/plot_controller.py (6)
src/aiperf/plot/config.py (3)
  • get_experiment_classification_config (261-289)
  • get_multi_run_plot_specs (155-206)
  • get_single_run_plot_specs (208-259)
src/aiperf/plot/core/data_loader.py (1)
  • DataLoader (153-926)
src/aiperf/plot/core/mode_detector.py (2)
  • ModeDetector (26-240)
  • VisualizationMode (19-23)
src/aiperf/plot/exporters/png/multi_run.py (2)
  • MultiRunPNGExporter (25-233)
  • export (37-81)
src/aiperf/plot/exporters/png/single_run.py (2)
  • SingleRunPNGExporter (24-129)
  • export (39-77)
src/aiperf/plot/logging.py (1)
  • setup_plot_logging (56-108)
src/aiperf/plot/handlers/multi_run_handlers.py (1)
src/aiperf/plot/exporters/png/multi_run.py (1)
  • _extract_experiment_types (202-233)
src/aiperf/cli.py (2)
src/aiperf/cli_utils.py (1)
  • exit_on_error (62-125)
src/aiperf/plot/cli_runner.py (1)
  • run_plot_controller (11-54)
src/aiperf/plot/config.py (2)
src/aiperf/plot/core/plot_specs.py (7)
  • DataSource (76-82)
  • ExperimentClassificationConfig (48-73)
  • MetricSpec (97-109)
  • PlotSpec (112-173)
  • PlotType (85-94)
  • Style (15-45)
  • TimeSlicePlotSpec (176-183)
src/aiperf/plot/metric_names.py (3)
  • get_aggregated_metrics (110-127)
  • get_request_metrics (130-149)
  • get_timeslice_metrics (152-168)
tests/unit/plot/test_png_exporter.py (3)
src/aiperf/plot/core/plot_specs.py (6)
  • DataSource (76-82)
  • MetricSpec (97-109)
  • PlotSpec (112-173)
  • PlotType (85-94)
  • Style (15-45)
  • TimeSlicePlotSpec (176-183)
src/aiperf/plot/exporters/png/multi_run.py (2)
  • export (37-81)
  • MultiRunPNGExporter (25-233)
src/aiperf/plot/exporters/png/single_run.py (1)
  • export (39-77)
src/aiperf/plot/core/data_loader.py (2)
src/aiperf/plot/core/plot_specs.py (1)
  • ExperimentClassificationConfig (48-73)
src/aiperf/common/protocols.py (1)
  • warning (71-71)
tests/unit/plot/test_cli_runner.py (1)
src/aiperf/plot/constants.py (2)
  • PlotMode (32-35)
  • PlotTheme (38-42)
src/aiperf/plot/exporters/png/single_run.py (3)
src/aiperf/plot/exporters/png/multi_run.py (1)
  • export (37-81)
src/aiperf/plot/exporters/base.py (1)
  • export (50-57)
src/aiperf/plot/core/plot_specs.py (1)
  • PlotSpec (112-173)
tests/unit/plot/test_plot_generator.py (1)
src/aiperf/plot/core/plot_generator.py (2)
  • _prepare_groups (271-366)
  • detect_directional_outliers (72-112)
src/aiperf/plot/cli_runner.py (3)
src/aiperf/cli.py (1)
  • plot (39-77)
src/aiperf/plot/constants.py (2)
  • PlotMode (32-35)
  • PlotTheme (38-42)
src/aiperf/plot/plot_controller.py (2)
  • PlotController (18-168)
  • run (68-79)
src/aiperf/plot/core/plot_generator.py (2)
src/aiperf/common/enums/metric_enums.py (3)
  • PlotMetricDirection (417-424)
  • MetricFlags (612-708)
  • has_flags (691-694)
src/aiperf/plot/metric_names.py (1)
  • get_metric_display_name (91-107)
🪛 markdownlint-cli2 (0.18.1)
docs/tutorials/plot.md

253-253: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


388-388: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🪛 Ruff (0.14.7)
src/aiperf/plot/metric_names.py

34-42: Consider iterable unpacking instead of concatenation

(RUF005)


45-56: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

src/aiperf/plot/plot_controller.py

52-52: Do not catch blind exception: Exception

(BLE001)


142-142: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/plot/core/plot_specs.py

152-155: Prefer TypeError exception for invalid type

(TRY004)


152-155: Avoid specifying long messages outside the exception class

(TRY003)


158-160: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/plot/config.py

89-91: Avoid specifying long messages outside the exception class

(TRY003)


100-103: Avoid specifying long messages outside the exception class

(TRY003)


129-131: Avoid specifying long messages outside the exception class

(TRY003)


139-141: Prefer TypeError exception for invalid type

(TRY004)


139-141: Abstract raise to an inner function

(TRY301)


139-141: Avoid specifying long messages outside the exception class

(TRY003)


144-146: Abstract raise to an inner function

(TRY301)


144-146: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Consider moving this statement to an else block

(TRY300)


151-153: Avoid specifying long messages outside the exception class

(TRY003)


169-172: Prefer TypeError exception for invalid type

(TRY004)


169-172: Avoid specifying long messages outside the exception class

(TRY003)


176-179: Prefer TypeError exception for invalid type

(TRY004)


176-179: Avoid specifying long messages outside the exception class

(TRY003)


185-187: Abstract raise to an inner function

(TRY301)


185-187: Avoid specifying long messages outside the exception class

(TRY003)


201-204: Avoid specifying long messages outside the exception class

(TRY003)


222-225: Prefer TypeError exception for invalid type

(TRY004)


222-225: Avoid specifying long messages outside the exception class

(TRY003)


229-232: Prefer TypeError exception for invalid type

(TRY004)


229-232: Avoid specifying long messages outside the exception class

(TRY003)


238-240: Abstract raise to an inner function

(TRY301)


238-240: Avoid specifying long messages outside the exception class

(TRY003)


254-257: Avoid specifying long messages outside the exception class

(TRY003)


279-282: Prefer TypeError exception for invalid type

(TRY004)


279-282: Avoid specifying long messages outside the exception class

(TRY003)


287-289: Avoid specifying long messages outside the exception class

(TRY003)


308-310: Prefer TypeError exception for invalid type

(TRY004)


308-310: Avoid specifying long messages outside the exception class

(TRY003)


314-314: Avoid specifying long messages outside the exception class

(TRY003)


339-339: Avoid specifying long messages outside the exception class

(TRY003)


415-418: Avoid specifying long messages outside the exception class

(TRY003)


438-441: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/plot/test_png_exporter.py

852-852: Unused method argument: sample_plot_specs

(ARG002)


946-946: Unused method argument: sample_plot_specs

(ARG002)

src/aiperf/plot/exporters/png/multi_run.py

121-121: Unused method argument: available_metrics

(ARG002)

tests/unit/plot/test_plot_generator.py

361-361: Unpacked variable groups is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


361-361: Unpacked variable display_names is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


382-382: Unpacked variable groups1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


382-382: Unpacked variable display_names1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


385-385: Unpacked variable groups2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


385-385: Unpacked variable display_names2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


399-399: Unpacked variable groups is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


399-399: Unpacked variable display_names is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1083-1083: Unpacked variable display_names is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1096-1096: Unpacked variable groups is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1096-1096: Unpacked variable display_names is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1127-1127: Unpacked variable display_names is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/aiperf/plot/core/plot_generator.py

398-398: Consider moving this statement to an else block

(TRY300)


399-400: try-except-pass detected, consider logging the exception

(S110)


399-399: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: integration-tests (macos-latest, 3.13)
  • GitHub Check: integration-tests (ubuntu-latest, 3.11)
  • GitHub Check: integration-tests (ubuntu-latest, 3.13)
  • GitHub Check: integration-tests (macos-latest, 3.12)
  • GitHub Check: integration-tests (macos-latest, 3.10)
  • GitHub Check: integration-tests (macos-latest, 3.11)
  • GitHub Check: build (macos-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.12)
  • GitHub Check: integration-tests (ubuntu-latest, 3.12)
  • GitHub Check: integration-tests (ubuntu-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.12)
  • GitHub Check: build (ubuntu-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.13)
  • GitHub Check: build (macos-latest, 3.13)
🔇 Additional comments (46)
src/aiperf/common/enums/metric_enums.py (1)

417-424: PlotMetricDirection enum looks consistent and clear

The new PlotMetricDirection enum integrates cleanly with existing enum patterns (CaseInsensitiveStrEnum) and the HIGHER/LOWER semantics are well documented for plotting use.

src/aiperf/cli.py (1)

38-76: Config and verbose wiring for plot command looks solid

The new config and verbose parameters are well documented and correctly propagated:

  • exit_on_error(..., show_traceback=verbose) matches the “show detailed error tracebacks” description.
  • run_plot_controller(..., theme=theme, config=config, verbose=verbose) ensures PlotController receives the same options the user passed on the CLI.
  • Examples in the docstring clearly illustrate the new flags.

No issues from a correctness standpoint.

src/aiperf/common/enums/__init__.py (1)

51-75: PlotMetricDirection correctly added to the public enums surface

PlotMetricDirection is imported from metric_enums and included in __all__, so consumers using from aiperf.common.enums import PlotMetricDirection will see the new enum as expected. This matches how other metric-related enums are exposed.

Also applies to: 113-179

src/aiperf/plot/cli_runner.py (1)

7-8: New config and verbose parameters are correctly threaded through to PlotController

  • Signature and type hints align with the CLI entrypoint and PlotController’s config_path/verbose parameters.
  • config_path = Path(config) if config else None is the expected conversion.
  • The docstring updates accurately describe the new arguments.

End-to-end, this keeps the CLI and runner surfaces consistent.

Also applies to: 11-48

tests/unit/plot/test_plot_controller.py (1)

274-279: Updated assertion keeps the test resilient while checking the important signal

Switching the check to:

captured = capsys.readouterr()
assert "Failed to load run" in captured.out

focuses the test on the substantive message rather than its exact formatting (e.g., dropped “Warning:” prefix). That keeps the test useful while reducing brittleness against cosmetic log changes.

tests/unit/plot/test_cli_runner.py (1)

284-299: LGTM!

The test correctly validates that verbose is forwarded to PlotController and that config_path=None is explicitly passed. The test structure follows pytest conventions with proper use of fixtures and mocks.

src/aiperf/plot/exporters/png/single_run.py (1)

39-59: LGTM!

The signature update to accept plot_specs aligns with the MultiRunPNGExporter API and enables config-driven plot generation. Type hints are properly specified.

src/aiperf/cli_utils.py (2)

71-88: LGTM!

The show_traceback parameter provides useful control over error output verbosity. Default True maintains backward compatibility.


104-113: LGTM!

The conditional traceback display is correctly implemented, only showing the full exception details when show_traceback is True.

tests/unit/plot/test_plot_generator.py (2)

116-122: LGTM!

Test assertions correctly updated to validate the new Pareto plot behavior with directional arrows in axis labels and optimal direction subtitle in titles.


941-953: LGTM!

Outlier detection test assertions correctly validate directional behavior for latency vs throughput metrics.

tests/unit/plot/test_experiment_classification.py (5)

1-17: LGTM!

Well-structured test module with proper imports, type hints, and pytest conventions. The test class is appropriately named and scoped.


19-68: LGTM!

Good coverage of basic classification scenarios with baseline and treatment pattern matching, including the important test for multiple patterns.


70-156: LGTM!

Excellent edge case coverage including first-match-wins behavior, default fallback, case sensitivity, and full path matching. These tests validate important classification semantics.


158-234: LGTM!

Good coverage of _extract_experiment_group behavior including top-level runs, concurrency subdirectories, and fallback when patterns don't match.


236-317: LGTM!

Well-designed parametrized tests for various glob patterns. The nested concurrency directory tests properly validate the immediate parent extraction logic.

src/aiperf/plot/handlers/multi_run_handlers.py (4)

61-85: LGTM with a note on duplication.

The helper correctly extracts experiment types from the DataFrame. Note that similar logic exists in src/aiperf/plot/exporters/png/multi_run.py (lines 201-232). Consider extracting to a shared utility if this pattern grows.


87-111: LGTM!

Clean implementation for extracting group display names. Mirrors the pattern of _extract_experiment_types appropriately.


143-157: LGTM!

Correctly integrates the new helper methods and passes the extracted mappings to create_pareto_plot.


189-203: LGTM!

Same pattern correctly applied for scatter line plots.

docs/tutorials/plot.md (3)

63-65: LGTM!

Good placement for the tip about Experiment Classification, providing early guidance for users.


180-221: LGTM!

Clear documentation of grouping customization with practical YAML examples.


386-388: LGTM!

Good best practice tip. The blank line flagged by markdownlint between blockquotes is a minor formatting issue that doesn't affect rendering in most Markdown processors.

src/aiperf/plot/core/plot_specs.py (2)

48-74: LGTM!

ExperimentClassificationConfig correctly inherits from BaseConfig and all fields have proper Field(description=...) annotations. Based on learnings, this follows the project conventions.


124-133: LGTM!

Clear documentation of the updated label_by and group_by field semantics.

src/aiperf/plot/plot_controller.py (3)

50-56: Logging fallback is reasonable despite static analysis hint.

The bare Exception catch (BLE001) is acceptable here since this is a best-effort logging setup with a graceful fallback to console-only logging. The fallback ensures the application continues even if file logging fails.


145-153: LGTM - plot specs and classification config are properly wired to exporter.

The PlotConfig integration correctly retrieves plot specs and classification config, passing them to the MultiRunPNGExporter.


166-168: LGTM - single-run export correctly passes plot specs.

The single-run export path properly retrieves and forwards plot specs to the SingleRunPNGExporter.

tests/unit/plot/test_png_exporter.py (2)

199-269: LGTM - Multi-run plot specs fixture is well-structured.

The sample_multi_run_plot_specs fixture correctly demonstrates various plot types with proper metric configurations including stats (avg, p50) for aggregated data sources.


1850-2050: Thorough validation tests for PlotSpec list normalization.

The TestPlotSpecListValidation class provides comprehensive coverage for the list-to-string normalization behavior, including edge cases for single elements, multiple elements, None values, and error conditions for plain strings.

src/aiperf/plot/exporters/png/multi_run.py (1)

176-200: LGTM - DataFrame construction with classification support is well-implemented.

The logic correctly handles:

  • Display name mapping with fallback via fillna()
  • Informative logging of unique experiment groups and types
  • Graceful handling when classification_config is None
tests/unit/plot/test_plot_config.py (2)

1-22: LGTM - Well-structured test module with proper imports.

The test file follows pytest conventions with clear class organization. All necessary imports from the plot configuration module are present.


416-566: Thorough classification override tests.

The TestExperimentClassificationOverride class provides excellent coverage for the experiment classification feature, including:

  • Override behavior when enabled
  • Preservation of original groups when disabled
  • Override of all group types
  • Nested directory structures
src/aiperf/plot/default_plot_config.yaml (4)

1-44: Well-documented configuration file with clear user guidance.

The header provides excellent documentation on customization workflow, quick start guide, and the three-tier configuration priority. The defaults lists are comprehensive and cover the expected use cases.


45-80: Multi-run plot presets are correctly structured.

The multi-run plots define proper type, x/y metrics with stat suffixes, labels, groups, and titles. The configuration guide in comments (lines 49-56) clearly explains the groups/labels behavior.


81-158: Single-run plot presets with comprehensive coverage.

The single-run presets cover various visualization needs:

  • Per-request scatter plots (TTFT, ITL)
  • Time-series with percentiles
  • Throughput area charts
  • Timeslice histograms
  • Dual-axis GPU utilization overlay

The dual_axis configuration (lines 145-157) correctly defines primary/secondary styles and supplementary column.


159-185: Experiment classification documentation is clear and helpful.

The commented example provides clear guidance on:

  • Semantic color assignment (grey for baselines, green for treatments)
  • Pattern matching priority
  • Glob pattern syntax examples

The default is set to treatment which is a reasonable fallback.

src/aiperf/plot/config.py (1)

1-448: LGTM overall — the configuration loading logic is well-structured.

The three-tier config resolution (custom → user → default), auto-creation of user config, and structured YAML validation provide a solid foundation for extensible plot configuration.

src/aiperf/plot/core/data_loader.py (3)

59-66: LGTM — new RunMetadata fields follow the data model conventions.

The experiment_type and experiment_group fields properly use Field(description=...) as required by the coding guidelines. Based on learnings, data models must inherit from AIPerfBaseModel and use Field descriptions.


161-172: LGTM — DataLoader initialization properly accepts optional classification config.

The optional classification_config parameter with None default maintains backward compatibility.


785-811: LGTM — _classify_experiment_type correctly implements pattern-based classification.

The method properly checks baseline patterns first, then treatment patterns, and falls back to the config default. The use of fnmatch for glob pattern matching is appropriate.

src/aiperf/plot/core/plot_generator.py (5)

471-534: LGTM — optimal quadrant shading implementation is well-designed.

The _add_optimal_quadrant_shading method correctly:

  • Guards against missing directions or empty data
  • Calculates rectangle bounds based on metric directions
  • Uses a subtle green overlay that doesn't obscure data
  • Places the "★ Optimal" annotation at the correct corner with appropriate anchoring

320-356: LGTM — semantic coloring for experiment groups is clear and well-documented.

The logic correctly assigns:

  • Grey for all baselines
  • NVIDIA Green for the first treatment
  • Seaborn "bright" palette for additional treatments

The ordered output (baselines first, then treatments) ensures consistent legend ordering.


449-458: Edge case: y_word assignment may be incorrect for LOWER direction.

When y_dir == PlotMetricDirection.LOWER, the current logic sets y_word = "low" (line 463), but the ternary "high" if y_dir == PlotMetricDirection.HIGHER else "low" correctly handles this. However, the quadrant determination at lines 455-458 assumes y_dir == PlotMetricDirection.LOWER but the comment says "lower-right" which implies x is HIGHER. This is correct.

The quadrant determination logic correctly maps all four combinations of x/y directions to their respective corner names.


578-589: LGTM — directional arrows enhance axis label clarity.

Adding / arrows to axis labels and generating explanatory subtitles helps users understand optimization direction at a glance.


1407-1414: The ZeroDivisionError concern is technically inaccurate. The code range(int(max_slice) + 1) does not raise ZeroDivisionError if max_slice is negative; it simply creates an empty range. The actual risk is empty tick lists if max_slice is unexpectedly small or if upstream data validation is insufficient. Verify upstream callers to confirm whether max_slice is guaranteed to be a valid non-negative integer and whether empty dataframes are handled before reaching this code.

@ilana-n ilana-n changed the title feat: add experiment classification and plot configuration system feat: AIPerf plot configuration system Dec 1, 2025
@ilana-n ilana-n force-pushed the ilana/plot_configure_yaml branch from 4ecdd10 to 60c29d5 Compare December 2, 2025 01:13
@ilana-n ilana-n force-pushed the ilana/plot_configure_yaml branch from 60c29d5 to f773efe Compare December 2, 2025 18:15
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@7e483d94780acc98bc60801dbf483063e6c40026

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@7e483d94780acc98bc60801dbf483063e6c40026

Last updated for commit: 7e483d9Browse code

@ilana-n ilana-n requested a review from lkomali December 3, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants