Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions ai-ml/mldiagnostics-webhook-and-operator/READMDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## mldiagnostics-webhook-and-operator

It provide helm charts for mldiagnostics webhook and operator, which is needed for integrating mldiagnostics SDK in GKE.



### Install cert-manager if not already installed

Cert-manager is a prerequisite for the injection-webhook. If it’s not installed, follow this to install. After installing cert-manager, it may take up to two minutes for the certificate to become ready.

```bash
helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.13.0 \
--set installCRDs=true \
--set global.leaderElection.namespace=cert-manager \
--timeout 10m
```

### Install injection-webhook

```bash
helm install mldiagnostics-injection-webhook \
--namespace=gke-diagon\
--create-namespace \
./injection-webhook/chart

```


### Install connection-operator

```bash
helm install mldiagnostics-connection-operator \
--namespace=gke-diagon\
--create-namespace \
./connection-operator/chart
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: v2
name: mldiagnostics-connection-operator
description: A Helm chart to capture profiler traces based on MLDiagnosticsConnection Custom Resource in frameworks JAX, Pytorch XLA and TensorFlow.
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.1.0"
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "chart.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "chart.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "chart.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "chart.labels" -}}
helm.sh/chart: {{ include "chart.chart" . }}
{{ include "chart.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "chart.selectorLabels" -}}
app.kubernetes.io/name: {{ include "chart.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Create the name of the service account to use
*/}}
{{- define "chart.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "chart.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "chart.fullname" . }}
labels:
control-plane: diagon-connection-operator
{{- include "chart.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.diagonConnectionOperator.replicas }}
selector:
matchLabels:
app.kubernetes.io/name: connection-operator
control-plane: diagon-connection-operator
{{- include "chart.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
app.kubernetes.io/name: connection-operator
control-plane: diagon-connection-operator
{{- include "chart.selectorLabels" . | nindent 8 }}
annotations:
kubectl.kubernetes.io/default-container: controller
spec:
containers:
- args: {{- toYaml .Values.diagonConnectionOperator.controller.args | nindent 8 }}
command:
- /manager
env:
- name: KUBERNETES_CLUSTER_DOMAIN
value: {{ quote .Values.kubernetesClusterDomain }}
image: {{ .Values.diagonConnectionOperator.controller.image.repository }}:{{ .Values.diagonConnectionOperator.controller.image.tag
| default .Chart.AppVersion }}
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
name: controller
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
resources: {{- toYaml .Values.diagonConnectionOperator.controller.resources | nindent
10 }}
securityContext: {{- toYaml .Values.diagonConnectionOperator.controller.containerSecurityContext
| nindent 10 }}
- env:
- name: KUBERNETES_CLUSTER_DOMAIN
value: {{ quote .Values.kubernetesClusterDomain }}
image: '{{ .Values.diagonConnectionOperator.googleCloudMldiagnosticsProfiler.image.repository
}}:{{ .Values.diagonConnectionOperator.googleCloudMldiagnosticsProfiler.image.tag
| default .Chart.AppVersion }}'
name: google-cloud-mldiagnostics-profiler
ports:
- containerPort: 5001
resources: {}
securityContext: {{- toYaml .Values.diagonConnectionOperator.googleCloudMldiagnosticsProfiler.containerSecurityContext
| nindent 10 }}
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext: {{- toYaml .Values.diagonConnectionOperator.podSecurityContext |
nindent 8 }}
serviceAccountName: {{ include "chart.fullname" . }}
terminationGracePeriodSeconds: 10
volumes:
- emptyDir: {}
name: tmp-volume
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "chart.fullname" . }}-leader-election-role
labels:
{{- include "chart.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "chart.fullname" . }}-leader-election-rolebinding
labels:
{{- include "chart.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: '{{ include "chart.fullname" . }}-leader-election-role'
subjects:
- kind: ServiceAccount
name: '{{ include "chart.fullname" . }}'
namespace: '{{ .Release.Namespace }}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "chart.fullname" . }}-manager-role
labels:
{{- include "chart.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- diagon.gke.io
resources:
- mldiagnosticsconnections
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- diagon.gke.io
resources:
- mldiagnosticsconnections/finalizers
verbs:
- update
- apiGroups:
- diagon.gke.io
resources:
- mldiagnosticsconnections/status
verbs:
- get
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "chart.fullname" . }}-manager-rolebinding
labels:
{{- include "chart.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: '{{ include "chart.fullname" . }}-manager-role'
subjects:
- kind: ServiceAccount
name: '{{ include "chart.fullname" . }}'
namespace: '{{ .Release.Namespace }}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "chart.fullname" . }}-metrics-auth-role
labels:
{{- include "chart.labels" . | nindent 4 }}
rules:
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "chart.fullname" . }}-metrics-auth-rolebinding
labels:
{{- include "chart.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: '{{ include "chart.fullname" . }}-metrics-auth-role'
subjects:
- kind: ServiceAccount
name: controller-manager
namespace: '{{ .Release.Namespace }}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "chart.fullname" . }}-metrics-reader
labels:
{{- include "chart.labels" . | nindent 4 }}
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: {{ include "chart.fullname" . }}-controller-manager-metrics
labels:
control-plane: controller-manager
{{- include "chart.labels" . | nindent 4 }}
spec:
type: {{ .Values.metricsService.type }}
selector:
app.kubernetes.io/name: connection-operator
control-plane: controller-manager
{{- include "chart.selectorLabels" . | nindent 4 }}
ports:
{{- .Values.metricsService.ports | toYaml | nindent 2 }}
Loading