-
Notifications
You must be signed in to change notification settings - Fork 11
feat: AIPerf plot configuration system #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/plot
Are you sure you want to change the base?
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yamlRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ilana/plot_configure_yaml |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive experiment classification and plot configuration system for AIPerf, enabling users to customize visualizations and distinguish between baseline and treatment runs with semantic coloring.
Key changes:
- Added YAML-based plot configuration system with auto-creation of user config on first run
- Implemented experiment classification to categorize runs as baseline/treatment with semantic colors
- Enhanced multi-run plots with optimal direction indicators and quadrant shading
Reviewed changes
Copilot reviewed 24 out of 33 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/plot/test_png_exporter.py | Updated tests to use config-driven plot specs instead of hardcoded constants |
| tests/unit/plot/test_plot_generator.py | Added tests for experiment group coloring and display name handling |
| tests/unit/plot/test_plot_controller.py | Updated warning message assertion for failed run loading |
| tests/unit/plot/test_plot_config.py | New comprehensive test suite for YAML config loading and validation |
| tests/unit/plot/test_experiment_classification.py | New test suite for baseline/treatment classification logic |
| tests/unit/plot/test_cli_runner.py | Updated test to include new verbose and config_path parameters |
| src/aiperf/plot/plot_controller.py | Added config and classification support to controller |
| src/aiperf/plot/metric_names.py | Added helper functions to retrieve metrics by data source |
| src/aiperf/plot/logging.py | Enhanced logging with console-only fallback and verbose mode support |
| src/aiperf/plot/handlers/multi_run_handlers.py | Added experiment type and display name extraction for plots |
| src/aiperf/plot/exporters/png/single_run.py | Updated to accept plot specs as parameter |
| src/aiperf/plot/exporters/png/multi_run.py | Added classification config support and experiment group handling |
| src/aiperf/plot/default_plot_config.yaml | New default YAML configuration file with plot presets |
| src/aiperf/plot/core/plot_specs.py | Added ExperimentClassificationConfig and list validation for grouping fields |
| src/aiperf/plot/core/plot_generator.py | Added semantic coloring, optimal direction indicators, and quadrant shading |
| src/aiperf/plot/core/data_loader.py | Implemented experiment classification and group extraction logic |
| src/aiperf/plot/constants.py | Added derived metric direction mappings |
| src/aiperf/plot/config.py | New PlotConfig class for YAML-based configuration loading |
| src/aiperf/plot/cli_runner.py | Added config and verbose parameters |
| src/aiperf/common/enums/metric_enums.py | Added PlotMetricDirection enum |
| src/aiperf/common/enums/init.py | Exported PlotMetricDirection |
| src/aiperf/cli_utils.py | Added show_traceback parameter to exit_on_error |
| src/aiperf/cli.py | Added config and verbose flags to plot command |
| docs/tutorials/plot.md | Added documentation for experiment classification and grouping |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis pull request adds YAML-based plot configuration management, experiment classification (baseline/treatment grouping), and enhanced visualization features including metric directionality indicators and optimal quadrant shading to NVIDIA AIPERF's plotting system. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes
Areas requiring extra attention:
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (28)
src/aiperf/plot/logging.py (3)
24-55: Global handler reset insetup_console_only_loggingcan clobber prior logging configImplementation is fine for a standalone CLI, but
for existing_handler in root_logger.handlers[:]followed byremoveHandlerwill drop any previously configured root handlers. Ifaiperf.plotis ever used as a library inside a larger process with its own logging, this will override that configuration.If you want to keep this reusable, consider either:
- Operating on a dedicated logger hierarchy instead of the root, or
- Only clearing handlers you know you own (e.g., marked by a custom attribute).
61-66: Console verbosity docs vs implementation are slightly out of syncDocstring says, “Console output shows WARNING+ by default, or all logs when verbose (DEBUG/INFO level).” The current logic:
console_level = level if level == "DEBUG" else "WARNING"means only
log_level=="DEBUG"yields full console output;INFOstill shows WARNING+. That’s fine for the currentPlotControllerusage (DEBUGwhen verbose,INFOotherwise), but the function’s docs suggest INFO could also be “verbose”.Consider tightening either:
- The implementation (e.g., treat
INFOas console-level INFO as well), or- The docstring (e.g., say “all logs when
log_level=='DEBUG', otherwise WARNING+”).
79-81: Minor: prefer logging level constants instead of strings
console_levelis currently a string ("DEBUG"/"WARNING").logginghappily accepts that, but usinglogging.DEBUG/logging.WARNINGwould be more idiomatic and avoids surprises if someone ever passes an unexpected string tolog_level.Not urgent, but worth considering for readability and static tooling.
Also applies to: 91-91
src/aiperf/plot/metric_names.py (2)
33-57: Minor style: align list construction with Ruff’s iterable-unpacking suggestionThe constructions:
_AGGREGATED_METRICS: list[str] = MetricRegistry.tags_applicable_to(...) + [ "output_token_throughput_per_gpu", ] _REQUEST_METRICS: list[str] = ( MetricRegistry.tags_applicable_to(...) + [ "request_number", ... ] )work fine, but Ruff’s RUF005 hint is suggesting iterable unpacking, which is a bit more idiomatic and avoids an intermediate list for concatenation. For example:
_AGGREGATED_METRICS: list[str] = [ *MetricRegistry.tags_applicable_to( MetricFlags.NONE, MetricFlags.NONE, MetricType.RECORD, MetricType.DERIVED, ), "output_token_throughput_per_gpu", ] _REQUEST_METRICS: list[str] = [ *MetricRegistry.tags_applicable_to( MetricFlags.NONE, MetricFlags.NONE, MetricType.RECORD, ), "request_number", "timestamp", "timestamp_s", "throughput_tokens_per_sec", "active_requests", ]Purely a readability/micro-optimization tweak; behavior is unchanged.
59-69: Guard against accidental mutation of module-level metric listsAll four getters:
def get_aggregated_metrics() -> list[str]: return _AGGREGATED_METRICS def get_request_metrics() -> list[str]: return _REQUEST_METRICS def get_timeslice_metrics() -> list[str]: return _TIMESLICE_METRICS def get_gpu_metrics() -> list[str]: return _GPU_METRICSreturn references to module-level lists. Any caller doing
get_aggregated_metrics().append(...)will mutate the shared global state, which could lead to hard-to-track bugs in plotting/validation.If you intend these as read-only definitions, consider one of:
- Keep internal storage as tuples and return lists:
_AGGREGATED_METRICS: tuple[str, ...] = (...) # ... def get_aggregated_metrics() -> list[str]: return list(_AGGREGATED_METRICS)
- Or keep them as lists but return copies:
def get_gpu_metrics() -> list[str]: return list(_GPU_METRICS)Either way, callers can safely modify the returned list without affecting global configuration.
Also applies to: 110-187
src/aiperf/plot/cli_runner.py (1)
52-54:Logs:line may be misleading when logging falls back to console-onlyYou always print:
print(f"Logs: {output_dir / PLOT_LOG_FILE}")But
PlotControllercan fall back tosetup_console_only_loggingif file logging fails (e.g., permission issues), in which case that log file may not exist.If you want to avoid confusing users in that scenario, consider:
- Having
PlotController.run()(or the controller instance) expose whether file logging was successfully configured and only print the log path when it is, or- Adjusting the message, e.g.,
Logs (if file logging enabled): ...or similar.src/aiperf/plot/constants.py (1)
63-68:DERIVED_METRIC_DIRECTIONSis a good central hook for non-registry metricsThe new mapping for derived metrics:
DERIVED_METRIC_DIRECTIONS = { "output_token_throughput_per_gpu": True, "output_token_throughput_per_user": True, }plus the explanatory comment about True/False semantics provides a clear, centralized way for plotting code to infer direction for metrics that don’t live in
MetricRegistry.Just keep in mind that any future derived metrics added in data processing should also be added here (and to the display-name layer) to avoid silent “unknown direction” cases in plotting.
tests/unit/plot/test_plot_generator.py (6)
361-367: Prefix unused unpacked variables with underscore.Static analysis correctly identifies that
groupsanddisplay_namesare unpacked but not used in this test. Per Python convention, prefix unused variables with_.- groups, color_map, display_names = plot_generator._prepare_groups(df, "model") + _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")
382-392: Prefix unused unpacked variables with underscore.Same issue:
groups1,groups2,display_names1, anddisplay_names2are unpacked but never used.- groups1, colors1, display_names1 = plot_generator._prepare_groups(df1, "model") + _groups1, colors1, _display_names1 = plot_generator._prepare_groups(df1, "model") df2 = pd.DataFrame({"model": ["ModelX", "ModelY", "ModelZ"]}) - groups2, colors2, display_names2 = plot_generator._prepare_groups(df2, "model") + _groups2, colors2, _display_names2 = plot_generator._prepare_groups(df2, "model")
399-409: Prefix unused unpacked variables with underscore.- groups, color_map, display_names = plot_generator._prepare_groups(df, "model") + _groups, color_map, _display_names = plot_generator._prepare_groups(df, "model")
1083-1087: Prefix unused unpacked variable with underscore.- groups, color_map, display_names = plot_gen._prepare_groups(df, group_by=None) + groups, color_map, _display_names = plot_gen._prepare_groups(df, group_by=None)
1096-1106: Prefix unused unpacked variables with underscore.- groups, color_map, display_names = plot_gen._prepare_groups(df, "model") + _groups, color_map, _display_names = plot_gen._prepare_groups(df, "model")
1127-1133: Prefix unused unpacked variable with underscore.- groups, color_map, display_names = plot_gen._prepare_groups( + groups, color_map, _display_names = plot_gen._prepare_groups( df, "nonexistent_column" )docs/tutorials/plot.md (1)
223-288: Add language specifier to code block.The directory structure code block at line 253 should have a language specifier for consistency.
-``` +```text artifacts/ ├── baseline_moderate_io_isl100_osl200_streaming/ # → Grey (baseline; ISL=100, OSL=200)src/aiperf/plot/core/plot_specs.py (1)
135-160: Consider usingTypeErrorfor type validation failures.Static analysis suggests
TypeErroris more appropriate thanValueErrorwhen the issue is an incorrect type. This aligns with Python conventions whereTypeErrorindicates wrong argument types.if isinstance(v, str): - raise ValueError( + raise TypeError( f"label_by and group_by must be provided as lists of strings, " f"not plain strings. Use [{v!r}] instead of {v!r}" ) if isinstance(v, list): return " | ".join(v) - raise ValueError( + raise TypeError( f"label_by and group_by must be lists of strings, got {type(v).__name__}" )src/aiperf/plot/plot_controller.py (1)
61-66: Inconsistent logging: usesprint()after setting up logger.The code sets up a logger but then uses
print()for the classification message. For consistency with the logging infrastructure, use the logger instead.- if classification_config: - print("Experiment classification enabled: grouping runs by baseline/treatment patterns") + if classification_config: + logger.info("Experiment classification enabled: grouping runs by baseline/treatment patterns")tests/unit/plot/test_png_exporter.py (1)
45-196: Comprehensive fixture for single-run plot specs.The
sample_plot_specsfixture provides good coverage of various plot types (scatter, area, histogram, dual_axis) with proper metric specifications. The inline import ofStyleandTimeSlicePlotSpecat line 48 could be moved to the top-level imports for consistency.Consider moving the import to the top of the file:
from aiperf.plot.core.plot_specs import ( DataSource, MetricSpec, PlotSpec, PlotType, + Style, + TimeSlicePlotSpec, )src/aiperf/plot/exporters/png/multi_run.py (3)
37-43: Missing type hint forclassification_configparameter.The
classification_configparameter lacks a type annotation. Based on the codebase context, this should be typed asExperimentClassificationConfig | None.As per coding guidelines requiring type hints on all parameters:
+from aiperf.plot.core.plot_specs import ExperimentClassificationConfig, PlotSpec -from aiperf.plot.core.plot_specs import PlotSpec def export( self, runs: list[RunData], available_metrics: dict, plot_specs: list[PlotSpec], - classification_config=None, + classification_config: ExperimentClassificationConfig | None = None, ) -> list[Path]:
120-122:available_metricsparameter is unused in_runs_to_dataframe.The static analysis correctly identifies that
available_metricsis passed but not used. Either remove it or document why it's kept for future use.def _runs_to_dataframe( - self, runs: list[RunData], available_metrics: dict, classification_config=None + self, runs: list[RunData], classification_config: ExperimentClassificationConfig | None = None ) -> pd.DataFrame:Then update the call site at line 59:
- df = self._runs_to_dataframe(runs, available_metrics, classification_config) + df = self._runs_to_dataframe(runs, classification_config)
202-233: Potential issue:iloc[0]access on potentially empty group DataFrame.In
_extract_experiment_types, ifdf[df[group_by] == group_val]results in an empty DataFrame for somegroup_val,iloc[0]will raise anIndexError. While unlikely given the loop iterates overdf[group_by].unique(), defensive coding would be safer.experiment_types = {} for group_val in df[group_by].unique(): group_df = df[df[group_by] == group_val] - experiment_types[group_val] = group_df["experiment_type"].iloc[0] + if not group_df.empty: + experiment_types[group_val] = group_df["experiment_type"].iloc[0]tests/unit/plot/test_plot_config.py (2)
72-106: Redundant environment variable and Path.home monkeypatch.The test sets
HOMEenv var viamonkeyenv(line 77) but also patchesPath.home()directly (line 101). ThePath.home()patch is sufficient and makes the env var setting unnecessary.def test_load_user_config(self, tmp_path, monkeypatch): """Test loading from user home config (~/.aiperf/plot_config.yaml).""" # Create fake home directory fake_home = tmp_path / "home" fake_home.mkdir() - monkeypatch.setenv("HOME", str(fake_home)) # Create user config user_config_dir = fake_home / ".aiperf"
906-940: Consider using@pytest.mark.parametrizefor percentile tests.The tests for p50, p90, p99 percentiles create separate plot configs but test the same behavior. This is a good candidate for parametrization per coding guidelines.
@pytest.mark.parametrize("stat", ["p50", "p90", "p99"]) # fmt: skip def test_aggregated_metric_with_percentile(self, tmp_path, stat): """Test dynamic resolution with percentiles.""" config_file = tmp_path / "config.yaml" config_file.write_text(f""" visualization: multi_run_defaults: - test_plot multi_run_plots: test_plot: type: scatter x: request_latency_{stat} y: request_throughput_avg single_run_defaults: [] single_run_plots: {{}} """) config = PlotConfig(config_file) specs = config.get_multi_run_plot_specs() assert specs[0].metrics[0].stat == statsrc/aiperf/plot/config.py (5)
133-136: Blocking file I/O detected.Per coding guidelines, I/O operations should use async/await. The synchronous
open()call here blocks the event loop if this code runs in an async context.If this module is used in an async context, consider using
aiofilesor wrapping inasyncio.to_thread(). However, if this is only called during initialization/startup (which appears to be the case), blocking I/O may be acceptable.
92-92: Consider using logging instead of print.Direct
print()statements bypass the logging infrastructure and cannot be controlled via log levels. This applies to lines 92, 108-110, and 113 as well.- print(f"Using config: {self.custom_path}") + logger.info(f"Using config: {self.custom_path}")
341-355: Callingget_experiment_classification_config()inside_preset_to_plot_specis inefficient.This method is called for every plot preset in a loop. Each call re-parses the same config section repeatedly.
Cache the experiment classification config at the class level during initialization or pass it as a parameter:
def __init__(self, config_path: Path | None = None, verbose: bool = False) -> None: ... self.resolved_path = self._resolve_config_path() self.config = self._load_yaml() + self._experiment_classification_config = self._parse_experiment_classification_config()Then reference
self._experiment_classification_configin_preset_to_plot_spec.
150-153: Broad exception catch re-raises as ValueError, potentially masking the original exception type.The
except Exceptionclause catches all exceptions including the already-raisedValueErrorfrom lines 139-146, wrapping them again. This could result in nestedValueErrorchains.Consider catching only non-ValueError exceptions or restructuring the try/except:
- except Exception as e: - raise ValueError( - f"Failed to load YAML config from {self.resolved_path}: {e}" - ) from e + except ValueError: + raise # Re-raise validation errors as-is + except Exception as e: + raise ValueError( + f"Failed to load YAML config from {self.resolved_path}: {e}" + ) from e
155-259: Significant code duplication betweenget_multi_run_plot_specsandget_single_run_plot_specs.Both methods follow the same pattern: validate list/dict types, iterate defaults, look up presets, call
_preset_to_plot_spec, and handle errors identically.Extract a shared helper method:
def _get_plot_specs( self, defaults_key: str, presets_key: str, plot_type_name: str ) -> list[PlotSpec]: viz_config = self.config.get("visualization", {}) defaults = viz_config.get(defaults_key, []) if not isinstance(defaults, list): raise ValueError(...) presets = viz_config.get(presets_key, {}) if not isinstance(presets, dict): raise ValueError(...) # ... shared logicsrc/aiperf/plot/core/plot_generator.py (1)
394-400: Silent exception swallowing could mask legitimate errors.The
except Exception: passsilently ignores all exceptions fromMetricRegistry.get_class(), including potential bugs or misconfigurations beyond "metric not found."Log at debug level to aid troubleshooting:
try: metric_class = MetricRegistry.get_class(metric_tag) if metric_class.has_flags(MetricFlags.LARGER_IS_BETTER): return PlotMetricDirection.HIGHER return PlotMetricDirection.LOWER - except Exception: - pass + except KeyError: + pass # Metric not in registry, check derived metrics + except Exception as e: + logger = logging.getLogger(__name__) + logger.debug(f"Error checking metric direction for '{metric_tag}': {e}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (9)
docs/diagrams/plot_examples/multi_run/config_experiment_classification/pareto_curve_throughput_per_gpu_vs_interactivity.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/config_experiment_classification/ttft_vs_throughput.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_interactivity.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_latency.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_interactivity.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/theme_dark_mode/pareto_curve_throughput_per_gpu_vs_latency.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/theme_dark_mode/ttft_vs_throughput.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/multi_run/ttft_vs_throughput.pngis excluded by!**/*.pngdocs/diagrams/plot_examples/single_run/timeslices/timeslices_throughput_warning.pngis excluded by!**/*.png
📒 Files selected for processing (24)
docs/tutorials/plot.md(3 hunks)src/aiperf/cli.py(2 hunks)src/aiperf/cli_utils.py(3 hunks)src/aiperf/common/enums/__init__.py(2 hunks)src/aiperf/common/enums/metric_enums.py(1 hunks)src/aiperf/plot/cli_runner.py(4 hunks)src/aiperf/plot/config.py(1 hunks)src/aiperf/plot/constants.py(1 hunks)src/aiperf/plot/core/data_loader.py(7 hunks)src/aiperf/plot/core/plot_generator.py(12 hunks)src/aiperf/plot/core/plot_specs.py(3 hunks)src/aiperf/plot/default_plot_config.yaml(1 hunks)src/aiperf/plot/exporters/png/multi_run.py(6 hunks)src/aiperf/plot/exporters/png/single_run.py(2 hunks)src/aiperf/plot/handlers/multi_run_handlers.py(5 hunks)src/aiperf/plot/logging.py(3 hunks)src/aiperf/plot/metric_names.py(3 hunks)src/aiperf/plot/plot_controller.py(5 hunks)tests/unit/plot/test_cli_runner.py(1 hunks)tests/unit/plot/test_experiment_classification.py(1 hunks)tests/unit/plot/test_plot_config.py(1 hunks)tests/unit/plot/test_plot_controller.py(1 hunks)tests/unit/plot/test_plot_generator.py(10 hunks)tests/unit/plot/test_png_exporter.py(28 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use async/await for all I/O operations; never use time.sleep() or blocking calls
Always use orjson for JSON operations: orjson.loads(s) and orjson.dumps(d)
All functions must have type hints on parameters and return types
Use Python 3.10+ union syntax (|) instead of typing.Union; use match/case for pattern matching; use @DataClass(slots=True)
Files:
src/aiperf/plot/constants.pytests/unit/plot/test_plot_controller.pysrc/aiperf/common/enums/__init__.pytests/unit/plot/test_experiment_classification.pytests/unit/plot/test_plot_config.pysrc/aiperf/plot/logging.pysrc/aiperf/plot/metric_names.pysrc/aiperf/common/enums/metric_enums.pysrc/aiperf/cli_utils.pysrc/aiperf/plot/plot_controller.pysrc/aiperf/plot/handlers/multi_run_handlers.pysrc/aiperf/cli.pysrc/aiperf/plot/core/plot_specs.pysrc/aiperf/plot/config.pytests/unit/plot/test_png_exporter.pysrc/aiperf/plot/exporters/png/multi_run.pysrc/aiperf/plot/core/data_loader.pytests/unit/plot/test_cli_runner.pysrc/aiperf/plot/exporters/png/single_run.pytests/unit/plot/test_plot_generator.pysrc/aiperf/plot/cli_runner.pysrc/aiperf/plot/core/plot_generator.py
**/*test*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test files must use pytest with fixtures, helpers, and @pytest.mark.parametrize; import statements at the top; use # fmt: skip for long parameterize blocks
Files:
tests/unit/plot/test_plot_controller.pytests/unit/plot/test_experiment_classification.pytests/unit/plot/test_plot_config.pytests/unit/plot/test_png_exporter.pytests/unit/plot/test_cli_runner.pytests/unit/plot/test_plot_generator.py
🧠 Learnings (2)
📚 Learning: 2025-11-25T00:08:56.802Z
Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/config/*.py : Configuration classes must inherit from BaseConfig and include Field(description='...') for all Pydantic fields
Applied to files:
src/aiperf/plot/core/plot_specs.py
📚 Learning: 2025-11-25T00:08:56.802Z
Learnt from: CR
Repo: ai-dynamo/aiperf PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T00:08:56.802Z
Learning: Applies to **/models/*.py : Data models must inherit from AIPerfBaseModel and use Field(description='...') for all Pydantic fields
Applied to files:
src/aiperf/plot/core/plot_specs.pysrc/aiperf/plot/core/data_loader.py
🧬 Code graph analysis (15)
src/aiperf/common/enums/__init__.py (1)
src/aiperf/common/enums/metric_enums.py (1)
PlotMetricDirection(417-424)
tests/unit/plot/test_experiment_classification.py (2)
src/aiperf/plot/core/data_loader.py (2)
_classify_experiment_type(785-811)_extract_experiment_group(813-843)src/aiperf/plot/core/plot_specs.py (1)
ExperimentClassificationConfig(48-73)
tests/unit/plot/test_plot_config.py (2)
src/aiperf/plot/config.py (4)
PlotConfig(40-448)get_multi_run_plot_specs(155-206)get_single_run_plot_specs(208-259)get_experiment_classification_config(261-289)src/aiperf/plot/core/plot_specs.py (5)
DataSource(76-82)MetricSpec(97-109)PlotSpec(112-173)PlotType(85-94)TimeSlicePlotSpec(176-183)
src/aiperf/plot/metric_names.py (1)
src/aiperf/common/enums/metric_enums.py (2)
MetricFlags(612-708)MetricType(399-414)
src/aiperf/plot/plot_controller.py (6)
src/aiperf/plot/config.py (3)
get_experiment_classification_config(261-289)get_multi_run_plot_specs(155-206)get_single_run_plot_specs(208-259)src/aiperf/plot/core/data_loader.py (1)
DataLoader(153-926)src/aiperf/plot/core/mode_detector.py (2)
ModeDetector(26-240)VisualizationMode(19-23)src/aiperf/plot/exporters/png/multi_run.py (2)
MultiRunPNGExporter(25-233)export(37-81)src/aiperf/plot/exporters/png/single_run.py (2)
SingleRunPNGExporter(24-129)export(39-77)src/aiperf/plot/logging.py (1)
setup_plot_logging(56-108)
src/aiperf/plot/handlers/multi_run_handlers.py (1)
src/aiperf/plot/exporters/png/multi_run.py (1)
_extract_experiment_types(202-233)
src/aiperf/cli.py (2)
src/aiperf/cli_utils.py (1)
exit_on_error(62-125)src/aiperf/plot/cli_runner.py (1)
run_plot_controller(11-54)
src/aiperf/plot/config.py (2)
src/aiperf/plot/core/plot_specs.py (7)
DataSource(76-82)ExperimentClassificationConfig(48-73)MetricSpec(97-109)PlotSpec(112-173)PlotType(85-94)Style(15-45)TimeSlicePlotSpec(176-183)src/aiperf/plot/metric_names.py (3)
get_aggregated_metrics(110-127)get_request_metrics(130-149)get_timeslice_metrics(152-168)
tests/unit/plot/test_png_exporter.py (3)
src/aiperf/plot/core/plot_specs.py (6)
DataSource(76-82)MetricSpec(97-109)PlotSpec(112-173)PlotType(85-94)Style(15-45)TimeSlicePlotSpec(176-183)src/aiperf/plot/exporters/png/multi_run.py (2)
export(37-81)MultiRunPNGExporter(25-233)src/aiperf/plot/exporters/png/single_run.py (1)
export(39-77)
src/aiperf/plot/core/data_loader.py (2)
src/aiperf/plot/core/plot_specs.py (1)
ExperimentClassificationConfig(48-73)src/aiperf/common/protocols.py (1)
warning(71-71)
tests/unit/plot/test_cli_runner.py (1)
src/aiperf/plot/constants.py (2)
PlotMode(32-35)PlotTheme(38-42)
src/aiperf/plot/exporters/png/single_run.py (3)
src/aiperf/plot/exporters/png/multi_run.py (1)
export(37-81)src/aiperf/plot/exporters/base.py (1)
export(50-57)src/aiperf/plot/core/plot_specs.py (1)
PlotSpec(112-173)
tests/unit/plot/test_plot_generator.py (1)
src/aiperf/plot/core/plot_generator.py (2)
_prepare_groups(271-366)detect_directional_outliers(72-112)
src/aiperf/plot/cli_runner.py (3)
src/aiperf/cli.py (1)
plot(39-77)src/aiperf/plot/constants.py (2)
PlotMode(32-35)PlotTheme(38-42)src/aiperf/plot/plot_controller.py (2)
PlotController(18-168)run(68-79)
src/aiperf/plot/core/plot_generator.py (2)
src/aiperf/common/enums/metric_enums.py (3)
PlotMetricDirection(417-424)MetricFlags(612-708)has_flags(691-694)src/aiperf/plot/metric_names.py (1)
get_metric_display_name(91-107)
🪛 markdownlint-cli2 (0.18.1)
docs/tutorials/plot.md
253-253: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
388-388: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🪛 Ruff (0.14.7)
src/aiperf/plot/metric_names.py
34-42: Consider iterable unpacking instead of concatenation
(RUF005)
45-56: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
src/aiperf/plot/plot_controller.py
52-52: Do not catch blind exception: Exception
(BLE001)
142-142: Avoid specifying long messages outside the exception class
(TRY003)
src/aiperf/plot/core/plot_specs.py
152-155: Prefer TypeError exception for invalid type
(TRY004)
152-155: Avoid specifying long messages outside the exception class
(TRY003)
158-160: Avoid specifying long messages outside the exception class
(TRY003)
src/aiperf/plot/config.py
89-91: Avoid specifying long messages outside the exception class
(TRY003)
100-103: Avoid specifying long messages outside the exception class
(TRY003)
129-131: Avoid specifying long messages outside the exception class
(TRY003)
139-141: Prefer TypeError exception for invalid type
(TRY004)
139-141: Abstract raise to an inner function
(TRY301)
139-141: Avoid specifying long messages outside the exception class
(TRY003)
144-146: Abstract raise to an inner function
(TRY301)
144-146: Avoid specifying long messages outside the exception class
(TRY003)
148-148: Consider moving this statement to an else block
(TRY300)
151-153: Avoid specifying long messages outside the exception class
(TRY003)
169-172: Prefer TypeError exception for invalid type
(TRY004)
169-172: Avoid specifying long messages outside the exception class
(TRY003)
176-179: Prefer TypeError exception for invalid type
(TRY004)
176-179: Avoid specifying long messages outside the exception class
(TRY003)
185-187: Abstract raise to an inner function
(TRY301)
185-187: Avoid specifying long messages outside the exception class
(TRY003)
201-204: Avoid specifying long messages outside the exception class
(TRY003)
222-225: Prefer TypeError exception for invalid type
(TRY004)
222-225: Avoid specifying long messages outside the exception class
(TRY003)
229-232: Prefer TypeError exception for invalid type
(TRY004)
229-232: Avoid specifying long messages outside the exception class
(TRY003)
238-240: Abstract raise to an inner function
(TRY301)
238-240: Avoid specifying long messages outside the exception class
(TRY003)
254-257: Avoid specifying long messages outside the exception class
(TRY003)
279-282: Prefer TypeError exception for invalid type
(TRY004)
279-282: Avoid specifying long messages outside the exception class
(TRY003)
287-289: Avoid specifying long messages outside the exception class
(TRY003)
308-310: Prefer TypeError exception for invalid type
(TRY004)
308-310: Avoid specifying long messages outside the exception class
(TRY003)
314-314: Avoid specifying long messages outside the exception class
(TRY003)
339-339: Avoid specifying long messages outside the exception class
(TRY003)
415-418: Avoid specifying long messages outside the exception class
(TRY003)
438-441: Avoid specifying long messages outside the exception class
(TRY003)
tests/unit/plot/test_png_exporter.py
852-852: Unused method argument: sample_plot_specs
(ARG002)
946-946: Unused method argument: sample_plot_specs
(ARG002)
src/aiperf/plot/exporters/png/multi_run.py
121-121: Unused method argument: available_metrics
(ARG002)
tests/unit/plot/test_plot_generator.py
361-361: Unpacked variable groups is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
361-361: Unpacked variable display_names is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
382-382: Unpacked variable groups1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
382-382: Unpacked variable display_names1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
385-385: Unpacked variable groups2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
385-385: Unpacked variable display_names2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
399-399: Unpacked variable groups is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
399-399: Unpacked variable display_names is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1083-1083: Unpacked variable display_names is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1096-1096: Unpacked variable groups is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1096-1096: Unpacked variable display_names is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1127-1127: Unpacked variable display_names is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
src/aiperf/plot/core/plot_generator.py
398-398: Consider moving this statement to an else block
(TRY300)
399-400: try-except-pass detected, consider logging the exception
(S110)
399-399: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: integration-tests (macos-latest, 3.13)
- GitHub Check: integration-tests (ubuntu-latest, 3.11)
- GitHub Check: integration-tests (ubuntu-latest, 3.13)
- GitHub Check: integration-tests (macos-latest, 3.12)
- GitHub Check: integration-tests (macos-latest, 3.10)
- GitHub Check: integration-tests (macos-latest, 3.11)
- GitHub Check: build (macos-latest, 3.11)
- GitHub Check: build (ubuntu-latest, 3.12)
- GitHub Check: integration-tests (ubuntu-latest, 3.12)
- GitHub Check: integration-tests (ubuntu-latest, 3.10)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (macos-latest, 3.12)
- GitHub Check: build (ubuntu-latest, 3.11)
- GitHub Check: build (ubuntu-latest, 3.10)
- GitHub Check: build (ubuntu-latest, 3.13)
- GitHub Check: build (macos-latest, 3.13)
🔇 Additional comments (46)
src/aiperf/common/enums/metric_enums.py (1)
417-424:PlotMetricDirectionenum looks consistent and clearThe new
PlotMetricDirectionenum integrates cleanly with existing enum patterns (CaseInsensitiveStrEnum) and the HIGHER/LOWER semantics are well documented for plotting use.src/aiperf/cli.py (1)
38-76: Config and verbose wiring forplotcommand looks solidThe new
configandverboseparameters are well documented and correctly propagated:
exit_on_error(..., show_traceback=verbose)matches the “show detailed error tracebacks” description.run_plot_controller(..., theme=theme, config=config, verbose=verbose)ensures PlotController receives the same options the user passed on the CLI.- Examples in the docstring clearly illustrate the new flags.
No issues from a correctness standpoint.
src/aiperf/common/enums/__init__.py (1)
51-75:PlotMetricDirectioncorrectly added to the public enums surface
PlotMetricDirectionis imported frommetric_enumsand included in__all__, so consumers usingfrom aiperf.common.enums import PlotMetricDirectionwill see the new enum as expected. This matches how other metric-related enums are exposed.Also applies to: 113-179
src/aiperf/plot/cli_runner.py (1)
7-8: Newconfigandverboseparameters are correctly threaded through toPlotController
- Signature and type hints align with the CLI entrypoint and PlotController’s
config_path/verboseparameters.config_path = Path(config) if config else Noneis the expected conversion.- The docstring updates accurately describe the new arguments.
End-to-end, this keeps the CLI and runner surfaces consistent.
Also applies to: 11-48
tests/unit/plot/test_plot_controller.py (1)
274-279: Updated assertion keeps the test resilient while checking the important signalSwitching the check to:
captured = capsys.readouterr() assert "Failed to load run" in captured.outfocuses the test on the substantive message rather than its exact formatting (e.g., dropped “Warning:” prefix). That keeps the test useful while reducing brittleness against cosmetic log changes.
tests/unit/plot/test_cli_runner.py (1)
284-299: LGTM!The test correctly validates that
verboseis forwarded toPlotControllerand thatconfig_path=Noneis explicitly passed. The test structure follows pytest conventions with proper use of fixtures and mocks.src/aiperf/plot/exporters/png/single_run.py (1)
39-59: LGTM!The signature update to accept
plot_specsaligns with theMultiRunPNGExporterAPI and enables config-driven plot generation. Type hints are properly specified.src/aiperf/cli_utils.py (2)
71-88: LGTM!The
show_tracebackparameter provides useful control over error output verbosity. DefaultTruemaintains backward compatibility.
104-113: LGTM!The conditional traceback display is correctly implemented, only showing the full exception details when
show_tracebackisTrue.tests/unit/plot/test_plot_generator.py (2)
116-122: LGTM!Test assertions correctly updated to validate the new Pareto plot behavior with directional arrows in axis labels and optimal direction subtitle in titles.
941-953: LGTM!Outlier detection test assertions correctly validate directional behavior for latency vs throughput metrics.
tests/unit/plot/test_experiment_classification.py (5)
1-17: LGTM!Well-structured test module with proper imports, type hints, and pytest conventions. The test class is appropriately named and scoped.
19-68: LGTM!Good coverage of basic classification scenarios with baseline and treatment pattern matching, including the important test for multiple patterns.
70-156: LGTM!Excellent edge case coverage including first-match-wins behavior, default fallback, case sensitivity, and full path matching. These tests validate important classification semantics.
158-234: LGTM!Good coverage of
_extract_experiment_groupbehavior including top-level runs, concurrency subdirectories, and fallback when patterns don't match.
236-317: LGTM!Well-designed parametrized tests for various glob patterns. The nested concurrency directory tests properly validate the immediate parent extraction logic.
src/aiperf/plot/handlers/multi_run_handlers.py (4)
61-85: LGTM with a note on duplication.The helper correctly extracts experiment types from the DataFrame. Note that similar logic exists in
src/aiperf/plot/exporters/png/multi_run.py(lines 201-232). Consider extracting to a shared utility if this pattern grows.
87-111: LGTM!Clean implementation for extracting group display names. Mirrors the pattern of
_extract_experiment_typesappropriately.
143-157: LGTM!Correctly integrates the new helper methods and passes the extracted mappings to
create_pareto_plot.
189-203: LGTM!Same pattern correctly applied for scatter line plots.
docs/tutorials/plot.md (3)
63-65: LGTM!Good placement for the tip about Experiment Classification, providing early guidance for users.
180-221: LGTM!Clear documentation of grouping customization with practical YAML examples.
386-388: LGTM!Good best practice tip. The blank line flagged by markdownlint between blockquotes is a minor formatting issue that doesn't affect rendering in most Markdown processors.
src/aiperf/plot/core/plot_specs.py (2)
48-74: LGTM!
ExperimentClassificationConfigcorrectly inherits fromBaseConfigand all fields have properField(description=...)annotations. Based on learnings, this follows the project conventions.
124-133: LGTM!Clear documentation of the updated
label_byandgroup_byfield semantics.src/aiperf/plot/plot_controller.py (3)
50-56: Logging fallback is reasonable despite static analysis hint.The bare
Exceptioncatch (BLE001) is acceptable here since this is a best-effort logging setup with a graceful fallback to console-only logging. The fallback ensures the application continues even if file logging fails.
145-153: LGTM - plot specs and classification config are properly wired to exporter.The PlotConfig integration correctly retrieves plot specs and classification config, passing them to the MultiRunPNGExporter.
166-168: LGTM - single-run export correctly passes plot specs.The single-run export path properly retrieves and forwards plot specs to the SingleRunPNGExporter.
tests/unit/plot/test_png_exporter.py (2)
199-269: LGTM - Multi-run plot specs fixture is well-structured.The
sample_multi_run_plot_specsfixture correctly demonstrates various plot types with proper metric configurations including stats (avg, p50) for aggregated data sources.
1850-2050: Thorough validation tests for PlotSpec list normalization.The
TestPlotSpecListValidationclass provides comprehensive coverage for the list-to-string normalization behavior, including edge cases for single elements, multiple elements, None values, and error conditions for plain strings.src/aiperf/plot/exporters/png/multi_run.py (1)
176-200: LGTM - DataFrame construction with classification support is well-implemented.The logic correctly handles:
- Display name mapping with fallback via
fillna()- Informative logging of unique experiment groups and types
- Graceful handling when classification_config is None
tests/unit/plot/test_plot_config.py (2)
1-22: LGTM - Well-structured test module with proper imports.The test file follows pytest conventions with clear class organization. All necessary imports from the plot configuration module are present.
416-566: Thorough classification override tests.The
TestExperimentClassificationOverrideclass provides excellent coverage for the experiment classification feature, including:
- Override behavior when enabled
- Preservation of original groups when disabled
- Override of all group types
- Nested directory structures
src/aiperf/plot/default_plot_config.yaml (4)
1-44: Well-documented configuration file with clear user guidance.The header provides excellent documentation on customization workflow, quick start guide, and the three-tier configuration priority. The defaults lists are comprehensive and cover the expected use cases.
45-80: Multi-run plot presets are correctly structured.The multi-run plots define proper type, x/y metrics with stat suffixes, labels, groups, and titles. The configuration guide in comments (lines 49-56) clearly explains the groups/labels behavior.
81-158: Single-run plot presets with comprehensive coverage.The single-run presets cover various visualization needs:
- Per-request scatter plots (TTFT, ITL)
- Time-series with percentiles
- Throughput area charts
- Timeslice histograms
- Dual-axis GPU utilization overlay
The dual_axis configuration (lines 145-157) correctly defines primary/secondary styles and supplementary column.
159-185: Experiment classification documentation is clear and helpful.The commented example provides clear guidance on:
- Semantic color assignment (grey for baselines, green for treatments)
- Pattern matching priority
- Glob pattern syntax examples
The default is set to
treatmentwhich is a reasonable fallback.src/aiperf/plot/config.py (1)
1-448: LGTM overall — the configuration loading logic is well-structured.The three-tier config resolution (custom → user → default), auto-creation of user config, and structured YAML validation provide a solid foundation for extensible plot configuration.
src/aiperf/plot/core/data_loader.py (3)
59-66: LGTM — new RunMetadata fields follow the data model conventions.The
experiment_typeandexperiment_groupfields properly useField(description=...)as required by the coding guidelines. Based on learnings, data models must inherit from AIPerfBaseModel and use Field descriptions.
161-172: LGTM — DataLoader initialization properly accepts optional classification config.The optional
classification_configparameter withNonedefault maintains backward compatibility.
785-811: LGTM —_classify_experiment_typecorrectly implements pattern-based classification.The method properly checks baseline patterns first, then treatment patterns, and falls back to the config default. The use of
fnmatchfor glob pattern matching is appropriate.src/aiperf/plot/core/plot_generator.py (5)
471-534: LGTM — optimal quadrant shading implementation is well-designed.The
_add_optimal_quadrant_shadingmethod correctly:
- Guards against missing directions or empty data
- Calculates rectangle bounds based on metric directions
- Uses a subtle green overlay that doesn't obscure data
- Places the "★ Optimal" annotation at the correct corner with appropriate anchoring
320-356: LGTM — semantic coloring for experiment groups is clear and well-documented.The logic correctly assigns:
- Grey for all baselines
- NVIDIA Green for the first treatment
- Seaborn "bright" palette for additional treatments
The ordered output (baselines first, then treatments) ensures consistent legend ordering.
449-458: Edge case:y_wordassignment may be incorrect for LOWER direction.When
y_dir == PlotMetricDirection.LOWER, the current logic setsy_word = "low"(line 463), but the ternary"high" if y_dir == PlotMetricDirection.HIGHER else "low"correctly handles this. However, the quadrant determination at lines 455-458 assumesy_dir == PlotMetricDirection.LOWERbut the comment says "lower-right" which implies x is HIGHER. This is correct.The quadrant determination logic correctly maps all four combinations of x/y directions to their respective corner names.
578-589: LGTM — directional arrows enhance axis label clarity.Adding
↑/↓arrows to axis labels and generating explanatory subtitles helps users understand optimization direction at a glance.
1407-1414: TheZeroDivisionErrorconcern is technically inaccurate. The coderange(int(max_slice) + 1)does not raiseZeroDivisionErrorifmax_sliceis negative; it simply creates an empty range. The actual risk is empty tick lists ifmax_sliceis unexpectedly small or if upstream data validation is insufficient. Verify upstream callers to confirm whethermax_sliceis guaranteed to be a valid non-negative integer and whether empty dataframes are handled before reaching this code.
4ecdd10 to
60c29d5
Compare
Signed-off-by: Ilana Nguyen <[email protected]>
60c29d5 to
f773efe
Compare
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@7e483d94780acc98bc60801dbf483063e6c40026Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@7e483d94780acc98bc60801dbf483063e6c40026Last updated for commit: |
Plot Configuration & Visualization Enhancements
Summary
This PR builds on top of #459 to introduce configurable YAML-based plot generation and experiment classification with default coloring.
Key Additions
Plot Generation System
Configurable YAML System
default_plot_config.yaml: Centralized configuration for all plot typesexperiment_group(directory name) to preserve treatment variantsgroups:settings when classification enabledrun_name(each run is a separate line)experiment_groupfor semantic coloring~/.aiperf/plot_config.yamlfor user to modify--verboseflag for detailed logging with error tracebacksExample Plots:

Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.