Skip to content

Commit c32ba72

Browse files
apeirora-service-user[bot]mlenkeitvasu1124
committed
Release 2.1.0
Co-authored-by: Maximilian Lenkeit <[email protected]> Co-authored-by: Vasu Chandrasekhara <[email protected]>
1 parent 2a85e79 commit c32ba72

10 files changed

+266
-1
lines changed

.vitepress/components/BlogPost.vue

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
</script>
1111

1212
<template>
13-
<article>
13+
<article class="blog-post">
1414
<header>
1515
<h2 v-if="title">
1616
<a :href="withBase(titleHref)">

.vitepress/theme/style.css

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,4 +238,20 @@
238238
.VPNavScreenMenu .VPNavScreenMenuLink[href="#--divider--"] {
239239
display: none;
240240
}
241+
}
242+
243+
/* Blog Styling */
244+
/*
245+
* Some blog post may start with a h2 headline, which by default has
246+
* significant padding-top, margin-top and border-top, which looks
247+
* odd. This correction aligns the look with a regular paragraph
248+
*/
249+
.vp-doc h1 + article.blog-post + h2 {
250+
border-top: 0;
251+
padding-top: 0;
252+
margin-top: 16px;
253+
254+
.header-anchor {
255+
top: 0;
256+
}
241257
}

[redirect].paths.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ const redirects = [
2424
from: 'blog/2025/08/07/kubernetes-api-server-and-controller-archetypes',
2525
to: 'blog/2025-08-07-kubernetes-api-server-and-controller-archetypes.md'
2626
},
27+
// Blogs that were shared internally for review and renamed later
28+
{
29+
from: 'blog/2025-08-25-enabling-ai-workloads-on-kubernetes-with-nvidia-gpu-operator-on-gardener-gardenlinux',
30+
to: 'blog/2025-08-25-garden-linux-enabling-ai-workloads-with-nvidia-gpus.md'
31+
},
2732
// Links that were shared on social media before changing directory structure
2833
{
2934
from: 'best-practices/digital-twins/controller',
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: "Garden Linux: Enabling AI on Kubernetes with NVIDIA GPUs"
3+
authors:
4+
- pavel-pavlov
5+
- darren-hague
6+
tags:
7+
- kubernetes
8+
- gardener
9+
- gardenlinux
10+
---
11+
12+
## AI and Kubernetes: Unlocking Business Innovation
13+
14+
Artificial Intelligence (AI) has become essential for business innovation, enabling companies to unlock new revenue streams, automate processes, and make data-driven decisions automatically and at scale.
15+
16+
There is industry-wide agreement that Kubernetes provides an ideal platform for running AI workloads (see [Cloud Native AI Whitepaper](https://www.cncf.io/reports/cloud-native-artificial-intelligence-whitepaper/)). Furthermore, the CNCF community is in the process of defining infrastructure level [AI Conformance](https://github.com/cncf/ai-conformance) which will make Kubernetes ubiquitous for AI workloads.
17+
18+
But for Kubernetes to support GPUs, you need the worker nodes' operating systems enabled with the right GPU drivers and associated access frameworks.
19+
20+
<!-- truncate -->
21+
22+
It may seem like just an obvious, pragmatic, and necessary requirement at the infrastructure level, but embedded in the fully open-source Apeiro Reference Architecture, governed and supported by (industry) members of the [NeoNephos Foundation](https://neonephos.org), its impact is substantial: **Apeiro freely empowers any organization or consortia seeking to build sovereign, modern datacenters for leveraging AI**.
23+
24+
Participation and contributions are not only welcome, but directly connect to the broader joint AI imperative of business.
25+
26+
## Simplifying NVIDIA GPU Support in Gardener
27+
28+
Easier said than done, there is significant operational complexity to consider: multi-cloud, hybrid environments, different hardware, diverse operating systems, complex driver management, and varying cloud provider configurations.
29+
30+
In Apeiro, we offer [Gardener](https://gardener.cloud) and [Garden Linux](https://github.com/gardenlinux) to tackle such operational complexity. With the [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator), we can provide a unified AI-conformant Kubernetes platform that works across any infrastructure with [NVIDIA Data Center GPUs](https://www.nvidia.com/en-us/data-center/data-center-gpus/).
31+
32+
## Understanding the NVIDIA GPU Operator
33+
34+
The NVIDIA GPU Operator automates GPU support in Kubernetes by deploying all the required software components (drivers, CUDA, device plugins, etc.) in the right [ABI-compatible](https://en.wikipedia.org/wiki/Application_binary_interface) versions. It eliminates any manual GPU driver installation and configuration, and enables GPUs as native Kubernetes resources. The GPU Operator is a Kubernetes-native operator with custom resource definitions. Furthermore, it ensures consistent GPU functionality across different hardware nodes and configurations, while enabling automatic updates, scaling, and troubleshooting through standard Kubernetes APIs.
35+
36+
<ApeiroFigure src="/img/blog/2025-08-25-nvidia-gpu-enablement-gpu-operator.png"
37+
caption="NVIDIA GPU Operator visualization in layers"
38+
source="docs.nvidia.com"
39+
sourceLink="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html"
40+
width="100%"/>
41+
42+
## Enabling Garden Linux for the GPU Operator
43+
44+
The NVIDIA GPU Operator is architected in a modular way so anyone who wants to build GPU Driver containers can make the GPU Operator work with their operating system.
45+
This is what we have done and we are making it publicly available. We used the public NVIDIA GPU Driver Dockerfile to create functional Garden Linux GPU Driver images. Please feel free to use them and collaborate by sharing feedback within the Garden Linux
46+
[gardenlinux-nvidia-installer](https://github.com/gardenlinux/gardenlinux-nvidia-installer/) repository.
47+
48+
Garden Linux builds containers for the three latest active NVIDIA driver branches on all
49+
Garden Linux versions that are in maintenance.
50+
51+
As of August 2025, this means containerized GPU drivers for the following combinations of major releases are available:
52+
53+
| Garden Linux | NVIDIA Driver |
54+
| - | - |
55+
| [1592](https://github.com/gardenlinux/gardenlinux/issues/2161) | 570, 565, 550 |
56+
| [1877](https://github.com/gardenlinux/gardenlinux/issues/2358) | 570, 565, 550 |
57+
58+
We automated the support directly in our build pipelines.
59+
60+
### Automating the Build
61+
62+
With guidance from NVIDIA[^thanks], Garden Linux's build and release process was adjusted to automatically publish the ABI-compatible container images required by the GPU Operator.
63+
64+
[^thanks]: Thanks to [Jathavan Sriram](https://www.linkedin.com/in/jathavansriram) from NVIDIA for the productive discussions.
65+
66+
<ApeiroFigure src="/img/blog/2025-08-25-nvidia-gpu-enablement-gpu-operator-publishing.svg"
67+
caption="Publishing workflow" />
68+
69+
An automated [workflow](https://github.com/gardenlinux/gardenlinux-nvidia-installer/blob/main/.github/workflows/update_version.yml) immediately creates a pull request for new driver versions. Hence, Garden Linux provides you with the latest GPU driver updates with zero effort! The results are published in Garden Linux's GitHub container registry [`ghcr.io/gardenlinux/gardenlinux-nvidia-installer`](https://github.com/gardenlinux/gardenlinux-nvidia-installer/pkgs/container/gardenlinux-nvidia-installer) with the [release](https://github.com/gardenlinux/gardenlinux-nvidia-installer/blob/main/.github/workflows/release.yml) workflow.
70+
71+
### Under the Hood
72+
Orchestrating the publishing of the drivers, wrapped in the correct container format needed by the NVIDIA GPU Operator, requires two major steps:
73+
74+
1. The new driver is [compiled](https://github.com/gardenlinux/gardenlinux-nvidia-installer/blob/main/.github/workflows/build_driver.yml) against the specific container-based environment and the exact [Linux Kernel](https://kernel.org/) version used in Garden Linux.
75+
76+
2. After Step 1 is successfully completed, the new driver is compatibly [packaged](https://github.com/gardenlinux/gardenlinux-nvidia-installer/blob/main/.github/workflows/build_image.yml) as OCI container, which can be easily picked up by the NVIDIA GPU Operator at runtime (cf. "nvidia-driver" entry point).
77+
78+
### Example Helm Chart Configuration
79+
80+
The GPU Operator is installed using a [Helm Chart](https://helm.sh/docs/topics/charts/) provided in the NVIDIA Helm repository. Running the NVIDIA GPU Operator on Garden Linux requires a specific set of configuration values in [gpu-operator-values.yaml](https://github.com/gardenlinux/gardenlinux-nvidia-installer/blob/main/helm/gpu-operator-values.yaml).
81+
82+
For sovereign (and air-gapped) environments, you need to maintain your own repository correctly in the `driver.repository` value of the Helm chart.
83+
84+
## Connecting the Dots
85+
86+
### Prerequisites
87+
88+
The example below assumes you have:
89+
90+
1. Access to a [Gardener Project](https://gardener.cloud/docs/getting-started/project/) with sufficient permissions to create a Kubernetes cluster on your preferred platform.
91+
2. Sufficient quota and permissions to create worker pools with data center-grade NVIDIA GPUs.
92+
3. Understanding of how to use Gardener and command line terminal.
93+
94+
### Installation Steps
95+
96+
1. Create Kubernetes cluster.
97+
98+
You can use any (and different) worker nodes with NVIDIA GPUs.
99+
100+
2. Install Helm
101+
102+
Follow the [NVIDIA GPU Driver Getting Started Operator Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#operator-install-guide) to prepare Helm.
103+
104+
It is important to add the NVIDIA Helm repository before proceeding to next step.
105+
106+
107+
3. Install the NVIDIA GPU Operator
108+
109+
You can further follow the guide from Step 2 or use the example from the [Garden Linux NVIDIA Installer](https://github.com/gardenlinux/gardenlinux-nvidia-installer). It is important to:
110+
111+
- make sure the `gpu-operator` namespace exists before installation or if you execute the command below consider adding the Helm flag `--create-namespace` as alternative.
112+
113+
- use Helm flag `--values` with value `https://raw.githubusercontent.com/gardenlinux/gardenlinux-nvidia-installer/refs/heads/main/helm/gpu-operator-values.yaml` as demonstrated below.
114+
115+
```bash
116+
helm upgrade --install -n gpu-operator --create-namespace gpu-operator nvidia/gpu-operator --values \
117+
https://raw.githubusercontent.com/gardenlinux/gardenlinux-nvidia-installer/refs/heads/main/helm/gpu-operator-values.yaml
118+
```
119+
120+
- By default you can use the latest supported version with the values file above, but if you really need it, you can change the `driver.version` property to any available version available in [Garden Linux NVIDIA Driver Package Repository](https://github.com/gardenlinux/gardenlinux-nvidia-installer/pkgs/container/gardenlinux-nvidia-installer).
121+
122+
4. Test GPU availability (optional)
123+
124+
You can verify that GPU Operator has worked correctly using a sample job from the NVIDIA [k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository. Deploy the following GPU pod manifest:
125+
126+
```yaml
127+
apiVersion: v1
128+
kind: Pod
129+
metadata:
130+
name: gpu-pod
131+
spec:
132+
restartPolicy: Never
133+
containers:
134+
- name: cuda-container
135+
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
136+
resources:
137+
limits:
138+
nvidia.com/gpu: 1 # requesting 1 GPU
139+
tolerations:
140+
- key: nvidia.com/gpu
141+
operator: Exists
142+
effect: NoSchedule
143+
```
144+
145+
If everything is working correctly, the container log should include a message containing the message `Test PASSED`:
146+
147+
<ApeiroFigure src="/img/blog/2025-08-25-nvidia-gpu-enablement-container-done.png"
148+
alt="Example of container logs"
149+
caption="Example of container logs" />
150+
151+
## Gardener Integration
152+
153+
With the NVIDIA GPU Operator working out of the box, we are planning to offer a complete end-to-end experience, by enabling the end user to order a Kubernetes cluster via Gardener with everything preset; as a Service. We will be working with the community and propose a Gardener Enhancement Proposal (GEP), with the goal to present the integrated experience as an extension like the one shown below.
154+
155+
```yaml
156+
kind: Shoot
157+
...
158+
spec:
159+
extensions:
160+
- type: nvidia-gpu-extension
161+
providerConfig:
162+
cdi:
163+
enabled: true
164+
default: true
165+
toolkit:
166+
installDir: /opt/nvidia
167+
driver:
168+
imagePullPolicy: Always
169+
usePrecompiled: true
170+
repository: ghcr.io/gardenlinux/gardenlinux-nvidia-installer
171+
...
172+
```
173+
174+
## Demo Video
175+
176+
Watch our 5 minutes demo and see how it works end-to-end!
177+
178+
<ApeiroFigure src="/img/blog/2025-08-25-nvidia-gpu-enablement-youtube-cover.png"
179+
caption="5-minute demo video on YouTube"
180+
href="https://youtu.be/7_e7mTvQFsU" />
181+
182+
## Outlook and Support
183+
184+
Our Apeiro community encourages you to share feedback or report any issues you encounter while using the NVIDIA GPU Operator on Garden Linux. Please open an issue in the [gardenlinux-nvidia-installer](https://github.com/gardenlinux/gardenlinux-nvidia-installer/issues) repository.
185+
186+
The team values your contributions and is eager to hear from your experience.

blog/authors.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,37 @@
44
# url: [url]
55
vasu-chandrasekhara:
66
name: Vasu Chandrasekhara
7+
title: Apeiro - Chief Product Owner at SAP
8+
url: https://www.linkedin.com/in/vasu1124/
79
uwe-krueger:
810
name: Uwe Krüger
11+
title: Senior Software Architect at SAP (retired)
12+
url: https://github.com/mandelsoft
913
mangirdas-judeikis:
1014
name: Mangirdas Judeikis
15+
title: Founder/Developer at Synpse
16+
url: https://www.linkedin.com/in/mangirdas/
1117
maximilian-lenkeit:
1218
name: Maximilian Lenkeit
19+
title: Development Architect at SAP
20+
url: https://www.linkedin.com/in/mlenkeit/
1321
simon-heimler:
1422
name: Simon Heimler
23+
title: Principal Architect at SAP
24+
url: https://www.linkedin.com/in/simonheimler/
1525
vyshnavi-gadamsetti:
1626
name: Vyshnavi Gadamsetti
27+
title: Associate Development Architect at SAP
28+
url: https://www.linkedin.com/in/vyga/
1729
erwin-margewitsch:
1830
name: Erwin Margewitsch
31+
title: Architect at SAP
32+
url: https://www.linkedin.com/in/erwin-margewitsch-9672bb89/
33+
pavel-pavlov:
34+
name: Pavel Pavlov
35+
title: Product Manager at SAP
36+
url: https://www.linkedin.com/in/pavelnpavlov
37+
darren-hague:
38+
name: Darren Hague
39+
title: AI Platform Architect at SAP
40+
url: https://www.linkedin.com/in/darrenhague
627 KB
Loading

0 commit comments

Comments
 (0)