-
Notifications
You must be signed in to change notification settings - Fork 841
Open
Description
What you would like to be added?
Kubernetes Jobs and JobSets support the TTL field that enables automatic resource cleanup. The TrainJob resource allows passing these TTLs down to the JobSet/Job level, which works for cleaning up pods, but the TrainJob resource itself remains indefinitely. It would be nice if there would be a TTL option implemented for the TrainJob resource, so we don't have to create additional systems on top to clean them up.
Why is this needed?
Without this, all the trainjob resources live on the cluster indefinitely after they have finished running and users are required to either manually clean them up or create cron jobs to clean them up. Would be nice to have parity with k8s jobs/jobsets.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
kannon92 and astefanutti