diff --git a/docs/AGENTS.md b/docs/AGENTS.md index e69de29..9411b7f 100644 --- a/docs/AGENTS.md +++ b/docs/AGENTS.md @@ -0,0 +1,19 @@ +# AI Agent Context Guide + +*This guide provides context and information for AI agents working with the Kubeflow Pipelines Components Repository.* + +## Coming Soon + +This document serves as a comprehensive context source for AI agents to understand: +- Repository structure and organization +- Component development patterns and standards +- Contribution workflows and processes +- Code quality requirements and testing practices +- Community guidelines and governance + +--- + +For immediate guidance, see: +- [Contributing Guide](CONTRIBUTING.md) +- [Governance Guide](GOVERNANCE.md) +- [Best Practices Guide](BESTPRACTICES.md) diff --git a/docs/BESTPRACTICES.md b/docs/BESTPRACTICES.md index e69de29..135cdd6 100644 --- a/docs/BESTPRACTICES.md +++ b/docs/BESTPRACTICES.md @@ -0,0 +1,21 @@ +# Component Development Best Practices + +*This guide is under development. Please check back soon for comprehensive best practices for developing Kubeflow Pipelines components.* + +## Coming Soon + +This document will cover: +- Component design patterns +- Performance optimization +- Security best practices +- Error handling strategies +- Documentation standards +- Testing methodologies +- Container optimization +- Resource management + +--- + +For immediate guidance, see: +- [Contributing Guide](CONTRIBUTING.md) - Complete guide with testing, setup, and workflow +- [Governance Guide](GOVERNANCE.md) - Repository policies and tiers diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index e69de29..db9e494 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -0,0 +1,320 @@ +# Contributing to Kubeflow Pipelines Components + +Welcome! This guide covers everything you need to know to contribute components and pipelines to this repository. + +## Table of Contents + +- [Prerequisites](#prerequisites) +- [Quick Setup](#quick-setup) +- [What We Accept](#what-we-accept) +- [Component Structure](#component-structure) +- [Development Workflow](#development-workflow) +- [Testing and Quality](#testing-and-quality) +- [Submitting Your Contribution](#submitting-your-contribution) +- [Getting Help](#getting-help) + +## Prerequisites + +Before contributing, ensure you have the following tools installed: + +- **Python 3.10+** for component development +- **uv** ([installation guide](https://docs.astral.sh/uv/getting-started/installation)) to manage Python dependencies including `kfp` and `kfp-kubernetes` packages +- **pre-commit** ([installation guide](https://pre-commit.com/#installation)) for automated code quality checks +- **Docker or Podman** to build container images for custom components +- **kubectl** ([installation guide](https://kubernetes.io/docs/tasks/tools/)) for Kubernetes operations + +All contributors must follow the [Kubeflow Community Code of Conduct](https://github.com/kubeflow/community/blob/master/CODE_OF_CONDUCT.md). + +## Quick Setup + +Get your development environment ready with these commands: + +```bash +# Fork and clone the repository +git clone https://github.com/YOUR_USERNAME/pipelines-components.git +cd pipelines-components +git remote add upstream https://github.com/kubeflow/pipelines-components.git + +# Set up Python environment +uv venv +source .venv/bin/activate +uv pip install -r requirements-dev.txt + +# Install pre-commit hooks for automatic code quality checks +pre-commit install + +# Verify your setup works +pytest +``` + +## What We Accept + +We welcome contributions of production-ready ML components and re-usable pipelines: + +- **Components** are individual ML tasks (data processing, training, evaluation, deployment) with usage examples +- **Pipelines** are complete multi-step workflows that can be nested within other pipelines +- **Bug fixes** improve existing components or fix documentation issues + +## Component Structure + +Components must be organized by category under `components//` (Core tier) or `third_party/components//` (Third-Party tier). + +Pipelines must be organized by category under `pipelines//` (Core tier) or `third_party/pipelines//` (Third-Party tier). + +## Naming Conventions + +- **Components and pipelines** use `snake_case` (e.g., `data_preprocessing`, `model_trainer`) +- **Commit messages** follow [Conventional Commits](https://conventionalcommits.org/) format with type prefix (feat, fix, docs, etc.) + +### Required Files + +Every component must include these files in its directory: + +``` +components/// +├── __init__.py # Exposes the component function for imports +├── component.py # Main implementation +├── metadata.yaml # Complete specification (see schema below) +├── README.md # Overview, inputs/outputs, usage examples, development instructions +├── OWNERS # Maintainers (at least one Kubeflow SIG owner for Core tier) +├── Containerfile # Container definition (required only for Core tier custom images) +├── example_pipelines.py # Working usage examples +└── tests/ +│ └── test_component.py # Unit tests +└── +``` + +Similarly, every pipeline must include these files: +``` +pipelines/// +├── __init__.py # Exposes the pipeline function for imports +├── pipeline.py # Main implementation +├── metadata.yaml # Complete specification (see schema below) +├── README.md # Overview, inputs/outputs, usage examples, development instructions +├── OWNERS # Maintainers (at least one Kubeflow SIG owner for Core tier) +├── example_pipelines.py # Working usage examples +└── tests/ +│ └── test_pipeline.py # Unit tests +└── +``` + +### metadata.yaml Schema + +Your `metadata.yaml` must include these fields: + +```yaml +name: my_component +tier: core # or 'third_party' +stability: stable # 'alpha', 'beta', or 'stable' +dependencies: + kubeflow: + - name: Pipelines + version: '>=2.5' + external_services: # Optional list of external dependencies + - name: Argo Workflows + version: "3.6" +tags: # Optional keywords for discoverability + - training + - evaluation +lastVerified: 2025-11-18T00:00:00Z # Updated annually; components are removed after 12 months without update +ci: + compile_check: true # Validates component compiles with kfp.compiler + skip_dependency_probe: false # Optional. Set true only with justification + pytest: optional # Set to 'required' for Core tier +links: # Optional, can use custom key-value (not limited to documentation, issue_tracker) + documentation: https://kubeflow.org/components/my_component + issue_tracker: https://github.com/kubeflow/pipelines-components/issues +``` + +### OWNERS File + +The OWNERS file enables component owners to self-service maintenance tasks including approvals, metadata updates, and lifecycle management: + +```yaml +approvers: + - maintainer1 # At least one must be a Kubeflow SIG owner/team member for Core tier + - maintainer2 +reviewers: + - reviewer1 +``` + +The `OWNERS` file enables code review automation by leveraging PROW commands: +- **Reviewers** (as well as **Approvers**), upon reviewing a PR and finding it good to merge, can comment `/lgtm`, which applies the `lgtm` label to the PR +- **Approvers** (but not **Reviewers**) can comment `/approver`, which signfies the PR is approved for automation to merge into the repo. +- If a PR has been labeled with both `lgtm` and `approve`, and all required CI checks are passing, PROW will merge the PR into the destination branch. + +See [full Prow documentation](https://docs.prow.k8s.io/docs/components/plugins/approve/approvers/#lgtm-label) for usage details. + + + +## Development Workflow + +### 1. Create Your Feature Branch + +Start by syncing with upstream and creating a feature branch: + +```bash +git fetch upstream +git checkout main +git merge upstream/main +git checkout -b component/my-component +``` + +### 2. Implement Your Component + +Create your component following the structure above. Here's a basic template: + +```python +# component.py +from kfp import dsl + +@dsl.component(base_image="python:3.10") +def hello_world(name: str = "World") -> str: + """A simple hello world component. + + Args: + name: The name to greet. Defaults to "World". + + Returns: + A greeting message. + """ + message = f"Hello, {name}!" + print(message) + return message +``` + +Write comprehensive tests for your component: + +```python +# tests/test_component.py +from ..component import hello_world + +def test_hello_world_default(): + """Test hello_world with default parameter.""" + # Access the underlying Python function from the component + result = hello_world.python_func() + assert result == "Hello, World!" + + +def test_hello_world_custom_name(): + """Test hello_world with custom name.""" + result = hello_world.python_func(name="Kubeflow") + assert result == "Hello, Kubeflow!" +``` + +### 3. Document Your Component + +This repository requires a standardized README.md. As such, we have provided a README Generation utility, which can be found in the `scripts` directory. + +Read more in the [README Generator Script Documentation](./scripts/generate_readme/README.md) + +## Testing and Quality + +### Running Tests Locally + +Run these commands from your component/pipeline directory before submitting your contribution: + +```bash +# Run all unit tests with coverage reporting +pytest --cov=src --cov-report=html + +# Run specific test files when debugging +pytest tests/test_my_component.py -v +``` + +### Code Quality Checks + +Ensure your code meets quality standards: + +```bash +# Format checking (120 character line length) +black --check --line-length 120 . + +# Docstring validation (Google convention) +pydocstyle --convention=google . + +# Validate metadata schema +python scripts/validate_metadata.py + +# Run all pre-commit hooks +pre-commit run --all-files +``` + +### Building Custom Container Images + +If your component uses a custom image, test the container build: + +```bash +# Build your component image +docker build -t my-component:test components//my-component/ + +# Test the container runs correctly +docker run --rm my-component:test --help +``` + +### CI Pipeline + +GitHub Actions automatically runs these checks on every pull request: + +- Code formatting (Black), linting (Flake8), docstring validation (pydocstyle), type checking (MyPy) +- Unit and integration tests with coverage reporting +- Container image builds for components with Containerfiles +- Security vulnerability scans +- Metadata schema validation +- Standardized README content and formatting conformance + +## Submitting Your Contribution + +### Commit Your Changes + +Use descriptive commit messages following the [Conventional Commits](https://conventionalcommits.org/) format: + +```bash +git add . +git status # Review what you're committing +git diff --cached # Check the actual changes + +git commit -m "feat(training): add training component + +- Implements Core-Tier component +- Includes comprehensive unit tests with 95% coverage +- Provides working pipeline examples +- Resolves #123" +``` + +### Push and Create Pull Request + +Push your changes and create a pull request on GitHub: + +```bash +git push origin component/my-component +``` + +On GitHub, click "Compare & pull request" and fill out the PR template provided with appropriate details + +All PRs must pass: +- Automated checks (linting, tests, builds) +- Code review by maintainers and community members +- Documentation review + +### Review Process + +All pull requests must complete the following: +- All Automated CI checks successfully passing +- Code Review - reviewers will verify the following: + - Component works as described + - Code is clean and well-documented + - Included tests provide good coverage. +- Receive approval from component OWNERS (for updates to existing components) or repository maintainers (for new components) + +## Getting Help + +- **Governance questions**: See [GOVERNANCE.md](GOVERNANCE.md) for tier requirements and processes +- **Community discussion**: Join `#kubeflow-pipelines` channel on the [CNCF Slack](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) +- **Bug reports and feature requests**: Open an issue at [GitHub Issues](https://github.com/kubeflow/pipelines-components/issues) + +--- + +This repository was established through [KEP-913: Components Repository](https://github.com/kubeflow/community/tree/master/proposals/913-components-repo). + +Thanks for contributing to Kubeflow Pipelines! 🚀 diff --git a/docs/GOVERNANCE.md b/docs/GOVERNANCE.md index e69de29..71b9b22 100644 --- a/docs/GOVERNANCE.md +++ b/docs/GOVERNANCE.md @@ -0,0 +1,200 @@ +# Repository Governance + +This document defines the governance structure for the Kubeflow Pipelines Components Repository. + +## Table of Contents + +- [Two-Tier System](#two-tier-system) +- [Ownership Models](#ownership-models) +- [Tier Transitions](#tier-transitions) +- [Removal Policies](#removal-policies) +- [Deprecation Policy](#deprecation-policy) +- [Repository Roles](#repository-roles) +- [Decision Making](#decision-making) +- [Conflict Resolution](#conflict-resolution) +- [Policy Updates](#policy-updates) +- [Related Documentation](#related-documentation) +- [Background](#background) + +## Repository Roles + +*Key roles and responsibilities for governing and maintaining the repository.* + +### KFP Component Repository Maintainer + +Repository Maintainers are responsible for the stewardship of the Kubeflow Pipelines Components repository. They are defined by having thier GitHub username listed in the `approvers` section of the `OWNERS` file in the repository root. + +Respository Maintiners key responsbilities include: +- Orchestrating releases +- Setting roadmaps and accepting KEPs related to Kubeflow Pipelines Components +- Managing the overall project, issues, etc +- General repository maintenance + + +### Core Component Maintainer + +Core Component Owners are individuals responsible for maintaining an individual core-tier component or pipeline. They are defined by having thier GitHub username listed in the `approvers` section of the `OWNERS` file of at least one individual core-tier component or pipeline. + +Core Component Maintainer key responsibilities include: +- Acting as the main point of contact for their component(s). +- Reviewing and approving changes to their component(s). +- Ensuring ongoing quality and documentation for their component(s). +- Updating or transferring ownership when maintainers change. + +Note that all components must have at least two listed owners for redundancy and review coverage. + + +### Third-Party Component Maintainers + +Similar to a Core Component Maintainer, a Third-Party Maintainer is responsible for at least one Third-Party tier component or pipelines that they or their teams own. They are defined by having thier GitHub username listed in the `approvers` section of the `OWNERS` file of at least one individual third-party tier component or pipeline. + +Third-Party Component Maintainer key responsibilities include: +- Acting as the main point of contact for their component(s). +- Reviewing and approving changes to their component(s). +- Ensuring ongoing quality and documentation for their component(s). +- Updating or transferring ownership when maintainers change. + +Note that all components must have at least two listed owners for redundancy and review coverage. + +## Two-Tier System + +*The repository uses a two-tier classification system distinguishing officially supported components from community contributions.* + +## Core Tier + + +**Officially supported components** maintained by at least 2 Component Core Maintainers. + +**Requirements:** +- Security review passed +- Complete documentation +- Active maintenance commitment +- Backward compatibility guarantees +- Unit test provided with exceptional code coverage + +**Benefits:** +- Official support and maintenance +- Included in python package releases +- Priority for bug fixes +- Long-term stability guarantees + +### Third-Party Tier + +**Community-contributed components** with lighter requirements. + +**Requirements:** +- Unit test provided +- Basic documentation (README, examples) +- At least 2 maintainers + +**Benefits:** +- Community visibility +- Shared maintenance burden +- Faster contribution process than Core components +- Good for idea incubation +- Potential for promotion to Core tier + +## Ownership Models + +*How ownership, maintenance, and decision-making responsibilities are distributed across tiers.* + +### Core Tier +- **Owned by**: Kubeflow community +- **Maintained by**: Designated maintainer teams +- **Decisions by**: Repository and Core Component Maintainers consensus +- **Support**: Official community support + +### Third-Party Tier (no Kubeflow org membership required) +- **Owned by**: Original contributors +- **Maintained by**: Component owners +- **Decisions by**: Component owners +- **Support**: Best-effort community support + +## Tier Transitions + +*Process for moving components between Core and Third-Party tiers.* + +## Removal Policies + +*Timeline and criteria for removing inactive or problematic components from the repository.* + +### Verification Process (9 months) +Components are marked for verification if: +- No updates in over 9 months +- Maintainers are unresponsive +- Compatibility issues + +### Removal Process (12 months) +After 12 months of inactivity: +1. **Notice**: 30-day removal notice +2. **Community input**: 2-week feedback period +3. **Final decision**: KFP Component Repository Maintainers +4. **Removal**: Delete component code from repository + +### Emergency Removal +Immediate removal for: +- Severe and/or compatibility-breaking issues +- Critical security vulnerabilities +- Legal issues +- Malicious code + +## Deprecation Policy + +*Structured approach to deprecating core components with adequate notice and migration support.* + +### Two-Release Policy +Components will be deprecated for a minimum of 2 Kubeflow releases before removal. + +**Process:** +1. **Deprecation notice**: Mark as deprecated +2. **Migration guide**: Provide alternatives +3. **Community notice**: Announce in releases +4. **Removal**: After 2 releases + + +## Decision Making + +*Framework for making technical, policy, and strategic decisions within the community.* + +### Decision Types +- **Technical**: Component owners → KFP Component Repository Maintainers +- **Policy**: KFP Component Repository Maintainers +- **Strategic**: KFP Component Repository Maintainers + +### Process +1. **Proposal**: Create GitHub issue/RFC +2. **Discussion**: Community feedback +3. **Decision**: Appropriate authority level +4. **Implementation**: Assign and track + +## Policy Updates + +*How governance policies are updated to evolve with community needs and learnings.* + +**Process:** +1. **RFC**: Propose changes via GitHub issue +2. **Community review**: 2-week feedback period +3. **Maintainers approval**: Majority vote required +4. **Implementation**: Update documentation and processes + +**Criteria for updates:** +- Community needs evolution +- Process improvements +- Conflict resolution learnings +- External requirements changes + +--- + +This governance model ensures quality, sustainability, and community collaboration while maintaining clear processes and expectations. + +## Related Documentation + +- **[Contributing Guide](CONTRIBUTING.md)** - Complete contributor guide with setup, testing, and workflow +- **[Best Practices Guide](BESTPRACTICES.md)** - Component development best practices *(coming soon)* +- **[Agents Guide](AGENTS.md)** - AI agent guidance *(coming soon)* + +## Background + +This governance model is based on [KEP-913: Components Repository](https://github.com/kubeflow/community/tree/master/proposals/913-components-repo), which established the framework for a curated collection of reusable Kubeflow Pipelines components with clear quality standards and community governance. + +For questions about governance, contact the pipelines-components repository maintainers (as noted by `approvers` in top-level `OWNERS` file) or open a GitHub issue. diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md deleted file mode 100644 index e69de29..0000000