-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
p1Do right nowDo right now
Description
Summary
Implement a comprehensive monitoring system that continuously tracks the status and performance of each environment in the RL curriculum, with automated warnings for unexpected behavior patterns such as performance regression in graduated environments.
Background
The current curriculum system in Marin tracks environments through different states (locked → unlocked → graduated) and monitors performance during active training. However, once an environment graduates, there's limited visibility into its ongoing performance. This can lead to silent regressions where a previously mastered environment deteriorates without notice.
Problem Statement
- Limited Post-Graduation Visibility: Once an environment graduates (reaches
stop_thresholdand plateaus), we stop actively monitoring its performance - No Regression Detection: If a graduated environment's performance degrades during subsequent training, this goes unnoticed
- Lack of Holistic View: No centralized view of curriculum health across all environments and their historical performance
- Missing Early Warning System: No proactive alerts for concerning patterns like:
- Graduated environments showing performance drops
- Environments stuck in training without progress
- Unusual performance volatility
Proposed Solution
1. Continuous Performance Tracking
- Track performance metrics for ALL environments (not just active ones) during periodic evaluations
- Store historical performance data with timestamps
- Maintain separate tracking for:
- Training performance
- Evaluation performance
- Time spent in each state (locked/active/graduated)
2. Monitoring Dashboard/Metrics
Create a comprehensive monitoring system that provides:
@dataclass
class EnvironmentHealthMetrics:
"""Health metrics for a single environment."""
env_id: str
current_state: str # "locked", "active", "graduated"
# Performance tracking
current_train_performance: float
current_eval_performance: float
peak_performance: float
performance_trend: str # "improving", "stable", "regressing"
# State duration
time_in_current_state: int
total_training_steps: int
# Health indicators
health_status: str # "healthy", "warning", "critical"
warnings: list[str] # List of active warnings3. Automated Warning System
Implement configurable alerts for:
- Graduated Environment Regression: Warn if eval performance drops by X% from graduation level
- Training Stagnation: Alert if an active environment shows no progress for N steps
- Unexpected State Transitions: Flag unusual patterns (e.g., rapid graduation followed by regression)
- Performance Volatility: Detect and warn about unstable performance patterns
4. Implementation Details
a) Extend Curriculum class
class Curriculum:
def __init__(self, ...):
# ... existing init ...
self.health_monitor = CurriculumHealthMonitor(config)
def update_stats(self, ...):
# ... existing update logic ...
# Add health monitoring
self.health_monitor.update(lesson_id, stats, mode)
warnings = self.health_monitor.check_warnings()
if warnings:
self._handle_warnings(warnings)b) Add HealthMonitor component
class CurriculumHealthMonitor:
def __init__(self, config: HealthMonitorConfig):
self.config = config
self.historical_performance = defaultdict(list)
self.graduation_baselines = {}
self.active_warnings = defaultdict(list)
def check_graduated_regression(self, env_id: str) -> Optional[Warning]:
"""Check if graduated environment has regressed."""
if env_id not in self.graduation_baselines:
return None
current_perf = self.get_current_performance(env_id)
baseline = self.graduation_baselines[env_id]
if current_perf < baseline * self.config.regression_threshold:
return Warning(
type="graduated_regression",
env_id=env_id,
message=f"Graduated env {env_id} performance dropped from {baseline:.3f} to {current_perf:.3f}",
severity="high"
)c) Logging and Visualization
- Log warnings to wandb/tensorboard with appropriate tags
- Create visualization dashboards showing:
- Curriculum state diagram with performance overlays
- Historical performance charts for each environment
- Warning/alert timeline
- Health status summary
5. Configuration
Add configuration options:
health_monitoring:
enabled: true
regression_threshold: 0.85 # Warn if performance drops below 85% of graduation level
stagnation_window: 100 # Steps without progress before warning
evaluation_frequency: 50 # How often to evaluate all environments
warning_cooldown: 200 # Steps before re-issuing same warning
metrics_retention_days: 30 # How long to keep historical dataBenefits
- Early Problem Detection: Catch performance regressions before they become critical
- Better Training Stability: Maintain consistent performance across all learned tasks
- Improved Debugging: Clear visibility into curriculum dynamics and problem areas
- Confidence in Deployment: Ensure model maintains competence across all trained environments
Success Criteria
- All environments are continuously monitored regardless of state
- Warnings are generated for configurable regression thresholds
- Historical performance data is tracked and accessible
- Integration with existing logging/visualization tools (wandb, tensorboard)
- Minimal performance overhead (<5% additional compute)
Open Questions
- Should we automatically intervene when warnings are detected (e.g., re-activate graduated environments)?
- What's the appropriate balance between monitoring frequency and computational cost?
- Should health metrics influence curriculum sampling weights?
- How long should we retain historical performance data?
Related Issues/Context
- Current curriculum implementation:
src/marin/rl/curriculum.py - Environment evaluation:
src/marin/rl/evaluate_environment.py - Consider integration with existing rollout stats tracking
Metadata
Metadata
Assignees
Labels
p1Do right nowDo right now