Skip to content

Enroot overlay filesystem compatibility issue in Kubernetes environments #42

@kotarusv

Description

@kotarusv

Environment:

Slinky Operator: v0.3.1
Kubernetes
Storage: local-path storage class (overlay filesystem)
Images: ghcr.io/slinkyproject/login-pyxis:25.05-ubuntu24.04, ghcr.io/slinkyproject/slurmd-pyxis:25.05-ubuntu24.04

Issue Summary:
We successfully implemented container support using Slinky but encounter overlay filesystem errors with complex container images. Simple images work correctly, but multi-layer images fail during enroot's overlay filesystem operations.

Two Approaches Tested:

Approach 1: OCI Runtime Integration

slurm:
configFiles:
oci.conf: |
OCIRuntime=/usr/bin/enroot
EnvExclude=HOME,MAIL,USER,SHELL,LOGNAME
RunTimeCreate=/usr/bin/enroot create
RunTimeDelete=/usr/bin/enroot remove

Approach 2: Pyxis SPANK Plugin

login:
image:
repository: ghcr.io/slinkyproject/login-pyxis
compute:
image:
repository: ghcr.io/slinkyproject/slurmd-pyxis
slurm:
configFiles:
plugstack.conf: |
include /usr/share/pyxis/*

Error (identical in both approaches):
enroot-aufs2ovlfs: failed to create ovlfs whiteout: /tmp/enroot.*/operation not permitted

Failed Error scanario:

kubectl exec -n slurm deployment/slurm-login -- srun --partition=debug --container-image=postgres:13 date
srun: unrecognized option '--container-image=postgres:13'
Try "srun --help" for more information
command terminated with exit code 255
(venv) [ubuntu@skotaru-lnx slinky (⎈ |kai-pdc-oidc:slurm)]$ kubectl describe pod -n slurm slurm-login-b666757b8-7bv9v | grep -i image
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/login:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/login@sha256:938c5abe666325ca00525fe9efe2209000638a703be717cd3f4b50882bb28fc8
Normal Pulled 66s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 66s kubelet Container image "ghcr.io/slinkyproject/login:25.05-ubuntu24.04" already present on machine
(venv) [ubuntu@skotaru-lnx slinky (⎈ |kai-pdc-oidc:slurm)]$ kubectl describe pod -n slurm slurm-compute-debug-0 | grep -i image
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/sackd@sha256:fe75328f91b22600261e5b65fa877830703608e9ea38eb3454ccaf28ed8407fb
Image: ghcr.io/slinkyproject/slurmd:25.05-ubuntu24.04
Image ID: ghcr.io/slinkyproject/slurmd@sha256:759a0573d18597ed39dcc41e6b6c6060a85d72cdd828ccf4217a13c57717d002
Normal Pulled 65s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 64s kubelet Container image "ghcr.io/slinkyproject/sackd:25.05-ubuntu24.04" already present on machine
Normal Pulled 63s kubelet Container image "ghcr.io/slinkyproject/slurmd:25.05-ubuntu24.04" already present on machine

Successful Scanario:

kubectl exec -n slurm deployment/slurm-login -- srun --partition=debug --container-image=alpine date
pyxis: importing docker image: alpine
pyxis: imported docker image: alpine
Wed Sep 3 21:35:37 UTC 2025

Working Examples:
alpine:latest - Success
python:3.9 - Success
ubuntu:20.04 - Success

Failing Examples:
postgres:13 - Overlay filesystem error
nvcr.io#nvidia/pytorch:23.10-py3 - Overlay filesystem error

Questions:
Is enroot/pyxis officially supported in Kubernetes environments with overlay storage?
Are there recommended configurations for avoiding overlay-on-overlay filesystem conflicts?
Should we expect limitations with complex multi-layer container images in this environment?

Additional Context:
The underlying issue appears to be enroot attempting to create overlay filesystems on top of Kubernetes' existing overlay storage, resulting in nested overlay operations that fail with permission errors.

Srinivas Kotaru

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions