-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Environment:
Slinky Operator: v0.3.1
Kubernetes
Storage: local-path storage class (overlay filesystem)
Images: ghcr.io/slinkyproject/login-pyxis:25.05-ubuntu24.04, ghcr.io/slinkyproject/slurmd-pyxis:25.05-ubuntu24.04
Issue Summary:
We successfully implemented container support using Slinky but encounter overlay filesystem errors with complex container images. Simple images work correctly, but multi-layer images fail during enroot's overlay filesystem operations.
Two Approaches Tested:
Approach 1: OCI Runtime Integration
slurm:
configFiles:
oci.conf: |
OCIRuntime=/usr/bin/enroot
EnvExclude=HOME,MAIL,USER,SHELL,LOGNAME
RunTimeCreate=/usr/bin/enroot create
RunTimeDelete=/usr/bin/enroot remove
Approach 2: Pyxis SPANK Plugin
login:
image:
repository: ghcr.io/slinkyproject/login-pyxis
compute:
image:
repository: ghcr.io/slinkyproject/slurmd-pyxis
slurm:
configFiles:
plugstack.conf: |
include /usr/share/pyxis/*
Error (identical in both approaches):
enroot-aufs2ovlfs: failed to create ovlfs whiteout: /tmp/enroot.*/operation not permitted
Failed Error scanario:
kubectl exec -n slurm deployment/slurm-login -- srun --partition=debug --container-image=postgres:13 date
srun: unrecognized option '--container-image=postgres:13'
Try "srun --help" for more information
command terminated with exit code 255
(venv) [ubuntu@skotaru-lnx slinky (⎈ |kai-pdc-oidc:slurm)]$ kubectl describe pod -n slurm slurm-login-b666757b8-7bv9v | grep -i image
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/login:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/login@sha256:938c5abe666325ca00525fe9efe2209000638a703be717cd3f4b50882bb28fc8
Normal Pulled 66s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 66s kubelet Container image "ghcr.io/slinkyproject/login:25.05-ubuntu24.04" already present on machine
(venv) [ubuntu@skotaru-lnx slinky (⎈ |kai-pdc-oidc:slurm)]$ kubectl describe pod -n slurm slurm-compute-debug-0 | grep -i image
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/slurmd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/slurmd@sha256:759a0573d18597ed39dcc41e6b6c6060a85d72cdd828ccf4217a13c57717d002
Normal Pulled 65s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 64s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 63s kubelet Container image "ghcr.io/slinkyproject/slurmd:25.05-ubuntu24.04" already present on machine
Successful Scanario:
kubectl exec -n slurm deployment/slurm-login -- srun --partition=debug --container-image=alpine date
pyxis: importing docker image: alpine
pyxis: imported docker image: alpine
Wed Sep 3 21:35:37 UTC 2025
Working Examples:
alpine:latest - Success
python:3.9 - Success
ubuntu:20.04 - Success
Failing Examples:
postgres:13 - Overlay filesystem error
nvcr.io#nvidia/pytorch:23.10-py3 - Overlay filesystem error
Questions:
Is enroot/pyxis officially supported in Kubernetes environments with overlay storage?
Are there recommended configurations for avoiding overlay-on-overlay filesystem conflicts?
Should we expect limitations with complex multi-layer container images in this environment?
Additional Context:
The underlying issue appears to be enroot attempting to create overlay filesystems on top of Kubernetes' existing overlay storage, resulting in nested overlay operations that fail with permission errors.
Srinivas Kotaru