|
| 1 | +# Contributing to Kubeflow Pipelines Components |
| 2 | + |
| 3 | +Welcome! This guide covers everything you need to know to contribute components and pipelines to this repository. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +- [Prerequisites](#prerequisites) |
| 8 | +- [Quick Setup](#quick-setup) |
| 9 | +- [What We Accept](#what-we-accept) |
| 10 | +- [Component Structure](#component-structure) |
| 11 | +- [Development Workflow](#development-workflow) |
| 12 | +- [Testing and Quality](#testing-and-quality) |
| 13 | +- [Submitting Your Contribution](#submitting-your-contribution) |
| 14 | +- [Getting Help](#getting-help) |
| 15 | + |
| 16 | +## Prerequisites |
| 17 | + |
| 18 | +Before contributing, ensure you have the following tools installed: |
| 19 | + |
| 20 | +- **Python 3.10+** for component development |
| 21 | +- **uv** ([installation guide](https://docs.astral.sh/uv/getting-started/installation)) to manage Python dependencies including `kfp` and `kfp-kubernetes` packages |
| 22 | +- **pre-commit** ([installation guide](https://pre-commit.com/#installation)) for automated code quality checks |
| 23 | +- **Docker or Podman** to build container images for custom components |
| 24 | +- **kubectl** ([installation guide](https://kubernetes.io/docs/tasks/tools/)) for Kubernetes operations |
| 25 | + |
| 26 | +All contributors must follow the [Kubeflow Community Code of Conduct](https://github.com/kubeflow/community/blob/master/CODE_OF_CONDUCT.md). |
| 27 | + |
| 28 | +## Quick Setup |
| 29 | + |
| 30 | +Get your development environment ready with these commands: |
| 31 | + |
| 32 | +```bash |
| 33 | +# Fork and clone the repository |
| 34 | +git clone https://github.com/YOUR_USERNAME/pipelines-components.git |
| 35 | +cd pipelines-components |
| 36 | +git remote add upstream https://github.com/kubeflow/pipelines-components.git |
| 37 | + |
| 38 | +# Set up Python environment |
| 39 | +uv venv |
| 40 | +source .venv/bin/activate |
| 41 | +uv pip install -r requirements-dev.txt |
| 42 | + |
| 43 | +# Install pre-commit hooks for automatic code quality checks |
| 44 | +pre-commit install |
| 45 | + |
| 46 | +# Verify your setup works |
| 47 | +pytest |
| 48 | +``` |
| 49 | + |
| 50 | +## What We Accept |
| 51 | + |
| 52 | +We welcome contributions of production-ready ML components and re-usable pipelines: |
| 53 | + |
| 54 | +- **Components** are individual ML tasks (data processing, training, evaluation, deployment) with usage examples |
| 55 | +- **Pipelines** are complete multi-step workflows that can be nested within other pipelines |
| 56 | +- **Bug fixes** improve existing components or fix documentation issues |
| 57 | + |
| 58 | +## Component Structure |
| 59 | + |
| 60 | +Components must be organized by category under `components/<category>/` (Core tier) or `third_party/components/<category>/` (Third-Party tier). |
| 61 | + |
| 62 | +Pipelines must be organized by category under `pipelines/<category>/` (Core tier) or `third_party/pipelines/<category>/` (Third-Party tier). |
| 63 | + |
| 64 | +## Naming Conventions |
| 65 | + |
| 66 | +- **Components and pipelines** use `snake_case` (e.g., `data_preprocessing`, `model_trainer`) |
| 67 | +- **Commit messages** follow [Conventional Commits](https://conventionalcommits.org/) format with type prefix (feat, fix, docs, etc.) |
| 68 | + |
| 69 | +### Required Files |
| 70 | + |
| 71 | +Every component must include these files in its directory: |
| 72 | + |
| 73 | +``` |
| 74 | +components/<category>/<component_name>/ |
| 75 | +├── __init__.py # Exposes the component function for imports |
| 76 | +├── component.py # Main implementation |
| 77 | +├── metadata.yaml # Complete specification (see schema below) |
| 78 | +├── README.md # Overview, inputs/outputs, usage examples, development instructions |
| 79 | +├── OWNERS # Maintainers (at least one Kubeflow SIG owner for Core tier) |
| 80 | +├── Containerfile # Container definition (required only for Core tier custom images) |
| 81 | +├── example_pipelines.py # Working usage examples |
| 82 | +└── tests/ |
| 83 | +│ └── test_component.py # Unit tests |
| 84 | +└── <supporting_files> |
| 85 | +``` |
| 86 | + |
| 87 | +Similarly, every pipeline must include these files: |
| 88 | +``` |
| 89 | +pipelines/<category>/<pipeline_name>/ |
| 90 | +├── __init__.py # Exposes the pipeline function for imports |
| 91 | +├── pipeline.py # Main implementation |
| 92 | +├── metadata.yaml # Complete specification (see schema below) |
| 93 | +├── README.md # Overview, inputs/outputs, usage examples, development instructions |
| 94 | +├── OWNERS # Maintainers (at least one Kubeflow SIG owner for Core tier) |
| 95 | +├── example_pipelines.py # Working usage examples |
| 96 | +└── tests/ |
| 97 | +│ └── test_pipeline.py # Unit tests |
| 98 | +└── <supporting_files> |
| 99 | +``` |
| 100 | + |
| 101 | +### metadata.yaml Schema |
| 102 | + |
| 103 | +Your `metadata.yaml` must include these fields: |
| 104 | + |
| 105 | +```yaml |
| 106 | +name: my_component |
| 107 | +tier: core # or 'third_party' |
| 108 | +stability: stable # 'alpha', 'beta', or 'stable' |
| 109 | +dependencies: |
| 110 | + kubeflow: |
| 111 | + - name: Pipelines |
| 112 | + version: '>=2.5' |
| 113 | + external_services: # Optional list of external dependencies |
| 114 | + - name: Argo Workflows |
| 115 | + version: "3.6" |
| 116 | +tags: # Optional keywords for discoverability |
| 117 | + - training |
| 118 | + - evaluation |
| 119 | +lastVerified: 2025-11-18T00:00:00Z # Updated annually; components are removed after 12 months without update |
| 120 | +ci: |
| 121 | + compile_check: true # Validates component compiles with kfp.compiler |
| 122 | + skip_dependency_probe: false # Optional. Set true only with justification |
| 123 | + pytest: optional # Set to 'required' for Core tier |
| 124 | +links: # Optional, can use custom key-value (not limited to documentation, issue_tracker) |
| 125 | + documentation: https://kubeflow.org/components/my_component |
| 126 | + issue_tracker: https://github.com/kubeflow/pipelines-components/issues |
| 127 | +``` |
| 128 | +
|
| 129 | +### OWNERS File |
| 130 | +
|
| 131 | +The OWNERS file enables component owners to self-service maintenance tasks including approvals, metadata updates, and lifecycle management: |
| 132 | +
|
| 133 | +```yaml |
| 134 | +approvers: |
| 135 | + - maintainer1 # At least one must be a Kubeflow SIG owner/team member for Core tier |
| 136 | + - maintainer2 |
| 137 | +reviewers: |
| 138 | + - reviewer1 |
| 139 | +``` |
| 140 | +
|
| 141 | +The `OWNERS` file enables code review automation by leveraging PROW commands: |
| 142 | +- **Reviewers** (as well as **Approvers**), upon reviewing a PR and finding it good to merge, can comment `/lgtm`, which applies the `lgtm` label to the PR |
| 143 | +- **Approvers** (but not **Reviewers**) can comment `/approver`, which signfies the PR is approved for automation to merge into the repo. |
| 144 | +- If a PR has been labeled with both `lgtm` and `approve`, and all required CI checks are passing, PROW will merge the PR into the destination branch. |
| 145 | + |
| 146 | +See [full Prow documentation](https://docs.prow.k8s.io/docs/components/plugins/approve/approvers/#lgtm-label) for usage details. |
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | +## Development Workflow |
| 151 | + |
| 152 | +### 1. Create Your Feature Branch |
| 153 | + |
| 154 | +Start by syncing with upstream and creating a feature branch: |
| 155 | + |
| 156 | +```bash |
| 157 | +git fetch upstream |
| 158 | +git checkout main |
| 159 | +git merge upstream/main |
| 160 | +git checkout -b component/my-component |
| 161 | +``` |
| 162 | + |
| 163 | +### 2. Implement Your Component |
| 164 | + |
| 165 | +Create your component following the structure above. Here's a basic template: |
| 166 | + |
| 167 | +```python |
| 168 | +# component.py |
| 169 | +from kfp import dsl |
| 170 | +
|
| 171 | +@dsl.component(base_image="python:3.10") |
| 172 | +def hello_world(name: str = "World") -> str: |
| 173 | + """A simple hello world component. |
| 174 | + |
| 175 | + Args: |
| 176 | + name: The name to greet. Defaults to "World". |
| 177 | + |
| 178 | + Returns: |
| 179 | + A greeting message. |
| 180 | + """ |
| 181 | + message = f"Hello, {name}!" |
| 182 | + print(message) |
| 183 | + return message |
| 184 | +``` |
| 185 | + |
| 186 | +Write comprehensive tests for your component: |
| 187 | + |
| 188 | +```python |
| 189 | +# tests/test_component.py |
| 190 | +from ..component import hello_world |
| 191 | +
|
| 192 | +def test_hello_world_default(): |
| 193 | + """Test hello_world with default parameter.""" |
| 194 | + # Access the underlying Python function from the component |
| 195 | + result = hello_world.python_func() |
| 196 | + assert result == "Hello, World!" |
| 197 | +
|
| 198 | +
|
| 199 | +def test_hello_world_custom_name(): |
| 200 | + """Test hello_world with custom name.""" |
| 201 | + result = hello_world.python_func(name="Kubeflow") |
| 202 | + assert result == "Hello, Kubeflow!" |
| 203 | +``` |
| 204 | + |
| 205 | +### 3. Document Your Component |
| 206 | + |
| 207 | +This repository requires a standardized README.md. As such, we have provided a README Generation utility, which can be found in the `scripts` directory. |
| 208 | + |
| 209 | +Read more in the [README Generator Script Documentation](./scripts/generate_readme/README.md) |
| 210 | + |
| 211 | +## Testing and Quality |
| 212 | + |
| 213 | +### Running Tests Locally |
| 214 | + |
| 215 | +Run these commands from your component/pipeline directory before submitting your contribution: |
| 216 | + |
| 217 | +```bash |
| 218 | +# Run all unit tests with coverage reporting |
| 219 | +pytest --cov=src --cov-report=html |
| 220 | +
|
| 221 | +# Run specific test files when debugging |
| 222 | +pytest tests/test_my_component.py -v |
| 223 | +``` |
| 224 | + |
| 225 | +### Code Quality Checks |
| 226 | + |
| 227 | +Ensure your code meets quality standards: |
| 228 | + |
| 229 | +```bash |
| 230 | +# Format checking (120 character line length) |
| 231 | +black --check --line-length 120 . |
| 232 | +
|
| 233 | +# Docstring validation (Google convention) |
| 234 | +pydocstyle --convention=google . |
| 235 | +
|
| 236 | +# Validate metadata schema |
| 237 | +python scripts/validate_metadata.py |
| 238 | +
|
| 239 | +# Run all pre-commit hooks |
| 240 | +pre-commit run --all-files |
| 241 | +``` |
| 242 | + |
| 243 | +### Building Custom Container Images |
| 244 | + |
| 245 | +If your component uses a custom image, test the container build: |
| 246 | + |
| 247 | +```bash |
| 248 | +# Build your component image |
| 249 | +docker build -t my-component:test components/<category>/my-component/ |
| 250 | +
|
| 251 | +# Test the container runs correctly |
| 252 | +docker run --rm my-component:test --help |
| 253 | +``` |
| 254 | + |
| 255 | +### CI Pipeline |
| 256 | + |
| 257 | +GitHub Actions automatically runs these checks on every pull request: |
| 258 | + |
| 259 | +- Code formatting (Black), linting (Flake8), docstring validation (pydocstyle), type checking (MyPy) |
| 260 | +- Unit and integration tests with coverage reporting |
| 261 | +- Container image builds for components with Containerfiles |
| 262 | +- Security vulnerability scans |
| 263 | +- Metadata schema validation |
| 264 | +- Standardized README content and formatting conformance |
| 265 | + |
| 266 | +## Submitting Your Contribution |
| 267 | + |
| 268 | +### Commit Your Changes |
| 269 | + |
| 270 | +Use descriptive commit messages following the [Conventional Commits](https://conventionalcommits.org/) format: |
| 271 | + |
| 272 | +```bash |
| 273 | +git add . |
| 274 | +git status # Review what you're committing |
| 275 | +git diff --cached # Check the actual changes |
| 276 | +
|
| 277 | +git commit -m "feat(training): add <my_component> training component |
| 278 | +
|
| 279 | +- Implements <my_component> Core-Tier component |
| 280 | +- Includes comprehensive unit tests with 95% coverage |
| 281 | +- Provides working pipeline examples |
| 282 | +- Resolves #123" |
| 283 | +``` |
| 284 | + |
| 285 | +### Push and Create Pull Request |
| 286 | + |
| 287 | +Push your changes and create a pull request on GitHub: |
| 288 | + |
| 289 | +```bash |
| 290 | +git push origin component/my-component |
| 291 | +``` |
| 292 | + |
| 293 | +On GitHub, click "Compare & pull request" and fill out the PR template provided with appropriate details |
| 294 | + |
| 295 | +All PRs must pass: |
| 296 | +- Automated checks (linting, tests, builds) |
| 297 | +- Code review by maintainers and community members |
| 298 | +- Documentation review |
| 299 | + |
| 300 | +### Review Process |
| 301 | + |
| 302 | +All pull requests must complete the following: |
| 303 | +- All Automated CI checks successfully passing |
| 304 | +- Code Review - reviewers will verify the following: |
| 305 | + - Component works as described |
| 306 | + - Code is clean and well-documented |
| 307 | + - Included tests provide good coverage. |
| 308 | +- Receive approval from component OWNERS (for updates to existing components) or repository maintainers (for new components) |
| 309 | + |
| 310 | +## Getting Help |
| 311 | + |
| 312 | +- **Governance questions**: See [GOVERNANCE.md](GOVERNANCE.md) for tier requirements and processes |
| 313 | +- **Community discussion**: Join `#kubeflow-pipelines` channel on the [CNCF Slack](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) |
| 314 | +- **Bug reports and feature requests**: Open an issue at [GitHub Issues](https://github.com/kubeflow/pipelines-components/issues) |
| 315 | + |
| 316 | +--- |
| 317 | + |
| 318 | +This repository was established through [KEP-913: Components Repository](https://github.com/kubeflow/community/tree/master/proposals/913-components-repo). |
| 319 | + |
| 320 | +Thanks for contributing to Kubeflow Pipelines! 🚀 |
0 commit comments