Skip to content

[BUG] - GPU images are incorrectly built from plain Ubuntu base instead of CUDA base #234

@marcelovilla

Description

@marcelovilla

Describe the bug

Our "Build Docker Images" GHA workflow includes logic to select different base images depending on the target architecture. CPU builds use a standard Ubuntu base image, while GPU builds use a CUDA-enabled base image.

See these relevant lines in the workflow:

GPU_BASE_IMAGE: nvidia/cuda:12.8.1-base-ubuntu24.04
GPU_IMAGE_SUFFIX: gpu

- name: "Set BASE_IMAGE and Image Suffix 📷"
if: ${{ matrix.platform == 'gpu' }}
run: |
echo "GPU Platform Matrix"
echo "BASE_IMAGE=$GPU_BASE_IMAGE" >> $GITHUB_ENV
echo "IMAGE_SUFFIX=-$GPU_IMAGE_SUFFIX" >> $GITHUB_ENV

build-args: BASE_IMAGE=${{ env.BASE_IMAGE }}


It used to be the case that our Dockerfiles accepted an ARG for the base image. For example:

ARG BASE_IMAGE=ubuntu:20.04

However, #211 appears to have changed that behavior unintentionally. I don’t think we noticed the issue until now, while reviewing #229. Looking at some of our recent image builds, I can confirm the CPU and GPU images are indeed using the same base image.

Taken from the jupyterlab-cpu build for the 2025.10.1rc1 tag GHA logs:

 #7 [linux/amd64 builder 1/4] FROM docker.io/library/ubuntu:24.04@sha256:66460d557b25769b102175144d538d88219c077c678a49af4afca6fbfc1b5252

Taken from the jupyterlab-gpu build for the 2025.10.1rc1 tag GHA logs:

#9 [linux/amd64 builder 1/4] FROM docker.io/library/ubuntu:24.04@sha256:66460d557b25769b102175144d538d88219c077c678a49af4afca6fbfc1b5252

Looking at the image digests, they're exactly the same, which is not the expected behavior. I think we should fix this so we make sure we built GPU images on top of the CUDA base image. I know @dcmcand has some thoughts on how to implement this instead of relying on having an ARG on our Dockerfile. We've been building images like these for some months now and haven't noticed any issues when running GPU workloads on Nebari. That makes me wonder why things still work as before if we're not relying on the CUDA images anymore. I guess it might have to do with the fact that we're still relying on the NVIDIA device plugin Daemonset.

Expected behavior

GPU images should build on top of a CUDA base image.

How to Reproduce the problem?

Take a look at the "Build Docker Images" GHA workflow

Command output

Versions and dependencies used.

No response

Anything else?

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions