feat(backend): Replace MLMD with KFP Server APIs #12430
Open
+64,260
−18,456
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of your changes:
This PR removes MLMD as per the KEP here
Resolves: #11760
Overview
Core Change: Replaced MLMD (ML Metadata) service with direct database storage via KFP API server.
This is a major architectural shift that eliminates the external ML Metadata service dependency and consolidates all artifact and task metadata operations directly into the KFP API server with MySQL/database backend.
Components Removed
MLMD Service Infrastructure
backend/metadata_writer/)backend/src/v2/metadata/)Deployment Changes
Components Added
New API Layer
Artifact Service API (
backend/api/v2beta1/artifact.proto)CRUD Operations:
CreateArtifact- Create single artifactGetArtifact- Retrieve artifact by IDListArtifacts- Query artifacts with filteringBatchCreateArtifacts- Bulk artifact creationArtifact Task Operations:
CreateArtifactTask- Track artifact usage in tasksListArtifactTasks- Query artifact-task relationshipsBatchCreateArtifactTasks- Bulk task-artifact linkingGenerated Clients:
Extended Run Service API (
backend/api/v2beta1/run.proto)New Task Endpoints:
CreateTask- Create pipeline task execution recordGetTask- Retrieve task detailsListTasks- Query tasks with filteringUpdateTask- Update task status/metadataBatchUpdateTasks- Efficient bulk task updatesViewMode Feature:
BASIC- Minimal response (IDs, status, timestamps)RUNTIME_ONLY- Include runtime details without full specFULL- Complete task/run details with specStorage Layer
Artifact Storage (
backend/src/apiserver/storage/artifact_store.go)Artifact Task Store (
backend/src/apiserver/storage/artifact_task_store.go)Enhanced Task Store (
backend/src/apiserver/storage/task_store.go)API Server Implementation
Artifact Server (
backend/src/apiserver/server/artifact_server.go)Extended Run Server (
backend/src/apiserver/server/run_server.go)Client Infrastructure
KFP API Client (
backend/src/v2/apiclient/)Driver/Launcher Refactoring
Parameter/Artifact Resolution (
backend/src/v2/driver/resolver/)resolve.go(~1,100 lines removed)parameters.go- Parameter resolution (~560 lines)artifacts.go- Artifact resolution (~314 lines)resolve.go- Orchestration (~90 lines)Driver Changes (
backend/src/v2/driver/)Launcher Changes (
backend/src/v2/cmd/launcher-v2/)Batch Updater (
backend/src/v2/component/batch_updater.go)Testing Infrastructure
Test Data Pipelines (
backend/src/v2/driver/test_data/)cache_test.yaml- Cache hit/miss scenarioscomponentInput.yaml- Input parameter testingk8s_parameters.yaml- Kubernetes-specific featuresoneof_simple.yaml- Conditional executionnested_naming_conflicts.yaml- Name resolution edge casesTest Coverage
Utility Additions
Scope Path (
backend/src/common/util/scope_path.go)Proto Helpers (
backend/src/common/util/proto_helpers.go)YAML Parser (
backend/src/common/util/yaml_parser.go)Key Behavioral Changes
Artifact Tracking
Task State Management
Performance Optimizations
API Response Size
ListRunswithVIEW_MODE=DEFAULT: ~80% smaller payloadsMigration Considerations
Database Schema
artifacts,artifact_taskstaskstable with new columnsBackwards Compatibility
Deployment
Testing Strategy
Unit Tests
Integration Tests
Golden File Updates
Files Changed Summary
Breakdown
Risks & Considerations
Testing
Performance
Operational
Recommended Follow-up
Conclusion
This is an architectural improvement that:
Checklist: