-
Notifications
You must be signed in to change notification settings - Fork 4.7k
AI starter kit chart #579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
volatilemolotov
wants to merge
20
commits into
kubernetes:master
Choose a base branch
from
volatilemolotov:ai-starter-kit-chart-2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
AI starter kit chart #579
Changes from 2 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
c01a078
init ai-starter-kit
volatilemolotov d57a1ea
Updated PVCs. added GPU support, added MacOS support
volatilemolotov 05c0644
remove example values and ci
volatilemolotov 6995e3c
clean up makefile
volatilemolotov 74322fc
changes to readme
volatilemolotov 2755940
update readme and change default password
volatilemolotov 2781c92
remove output from ray.ipynb
drogovozDP a871088
Merge pull request #29 from volatilemolotov/ai-starter-kit-2-remove-blob
volatilemolotov 7d7df31
update readme
volatilemolotov 6bf740d
add pre delete hook for singleuser env
volatilemolotov ceb424e
readme fixes, added hook to delete singleuser pod
volatilemolotov 42ca138
Applying fixes to resolve PR comments
alex-akv b20d943
Merge pull request #30 from volatilemolotov/comment-resolve
volatilemolotov 0672f83
remove mlflow enabled from doc
volatilemolotov ced46e9
Applying fixes to resolve PR comments
alex-akv 78a03d7
Updating readme with makefile command description
alex-akv 85f83fd
Add model cache pvc and the "serve" field for ramalama (#31)
ArthurKamalov a796ea6
Applying fixes to the multi-agent ray
alex-akv f924e3e
Applying fixes for all multi agent notebooks
alex-akv 6d43c94
Applying fixes for all multi agent notebooks
alex-akv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| .PHONY: check_hf_token check_OCI_target package_helm lint dep_update install install_gke start uninstall push_helm | ||
|
|
||
| check_hf_token: | ||
| ifndef HF_TOKEN | ||
| $(error HF_TOKEN is not set) | ||
| endif | ||
|
|
||
| check_OCI_target: | ||
| ifndef OCI_HELM_TARGET | ||
| $(error OCI_HELM_TARGET is not set) | ||
| endif | ||
|
|
||
| package_helm: | ||
| helm package helm-chart/ai-starter-kit/ --destination out/ | ||
|
|
||
| push_helm: check_OCI_target | ||
| helm push out/ai-starter-kit* oci://$$OCI_HELM_TARGET | ||
|
|
||
| lint: | ||
| helm lint helm-chart/ai-starter-kit | ||
|
|
||
| dep_update: | ||
| helm dependency update helm-chart/ai-starter-kit | ||
|
|
||
| install: check_hf_token | ||
| helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values.yaml | ||
|
|
||
| install_gke: check_hf_token | ||
| helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values-gke.yaml | ||
|
|
||
| install_gke_gpu: check_hf_token | ||
| helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values-gke-gpu.yaml | ||
|
|
||
| start: | ||
| mkdir -p /tmp/models-cache | ||
| minikube start --cpus 4 --memory 15000 --mount --mount-string="/tmp/models-cache:/tmp/models-cache" | ||
|
|
||
| start_gpu: | ||
| mkdir -p $HOME/models-cache | ||
| minikube start --driver krunkit --cpus 4 --memory 15000 --mount --mount-string="$HOME/models-cache:$HOME/models-cache" | ||
|
|
||
| uninstall: | ||
| helm uninstall ai-starter-kit | ||
| kubectl delete pod jupyter-user | ||
| kubectl delete pvc ai-starter-kit-jupyterhub-hub-db-dir | ||
|
|
||
| destroy: | ||
| minikube delete | ||
|
|
||
| validate_jupyterhub: | ||
| kubectl get pods; \ | ||
| kubectl wait --for=condition=Ready pods -l 'component!=continuous-image-puller' --timeout=1800s; \ | ||
| kubectl get pods; \ | ||
| kubectl get services; \ | ||
| kubectl port-forward service/ai-starter-kit-jupyterhub-proxy-public 8081:80 & \ | ||
| PID=$$!; \ | ||
| echo "Port-forward PID=$${PID}"; \ | ||
| sleep 5s; \ | ||
| python3 ./ci/test_hub.py "127.0.0.1:8081"; \ | ||
| kill $$PID | ||
|
|
||
| validate_ray: | ||
| kubectl wait --for=condition=Ready pods -l 'app.kubernetes.io/created-by=kuberay-operator' --timeout=1800s; \ | ||
| kubectl get pods; \ | ||
| kubectl get services; \ | ||
| kubectl port-forward service/ai-starter-kit-kuberay-head-svc 8265:8265 & \ | ||
| PID=$$!; \ | ||
| sleep 10s; \ | ||
| ray job submit --address=http://127.0.0.1:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"; \ | ||
| kill $$PID | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| project_id = "" | ||
| default_resource_name = "" | ||
|
|
||
| cluster_name = "" # Leave empty to use the default name (default_resource_name) | ||
| cluster_location = "us-central1" | ||
| private_cluster = false | ||
| autopilot_cluster = true | ||
|
|
||
| service_account_name = "" # Leave empty to use the default name |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| terraform { | ||
|
|
||
| required_providers { | ||
| kubectl = { | ||
| source = "gavinbunney/kubectl" | ||
| version = ">= 1.19.0" | ||
| } | ||
| } | ||
| } | ||
| data "google_client_config" "default" {} | ||
|
|
||
|
|
||
| data "google_project" "project" { | ||
| project_id = var.project_id | ||
| } | ||
|
|
||
|
|
||
| locals { | ||
| cluster_name = var.cluster_name != "" ? var.cluster_name : var.default_resource_name | ||
| } | ||
|
|
||
| module "gke_cluster" { | ||
| source = "github.com/ai-on-gke/common-infra/common/infrastructure?ref=main" | ||
|
|
||
| project_id = var.project_id | ||
| cluster_name = local.cluster_name | ||
| cluster_location = var.cluster_location | ||
| autopilot_cluster = var.autopilot_cluster | ||
| private_cluster = var.private_cluster | ||
| create_network = false | ||
| network_name = "default" | ||
| subnetwork_name = "default" | ||
| enable_gpu = true | ||
| gpu_pools = [ | ||
| { | ||
| name = "gpu-pool-l4" | ||
| machine_type = "g2-standard-24" | ||
| node_locations = "us-central1-a" ## comment to autofill node_location based on cluster_location | ||
| autoscaling = true | ||
| min_count = 1 | ||
| max_count = 3 | ||
| disk_size_gb = 100 | ||
| disk_type = "pd-balanced" | ||
| enable_gcfs = true | ||
| logging_variant = "DEFAULT" | ||
| accelerator_count = 2 | ||
| accelerator_type = "nvidia-l4" | ||
| gpu_driver_version = "DEFAULT" | ||
| } | ||
| ] | ||
| ray_addon_enabled = false | ||
| } | ||
|
|
||
| locals { | ||
| #ca_certificate = base64decode(module.gke_cluster.ca_certificate) | ||
| cluster_membership_id = var.cluster_membership_id == "" ? local.cluster_name : var.cluster_membership_id | ||
| host = var.private_cluster ? "https://connectgateway.googleapis.com/v1/projects/${data.google_project.project.number}/locations/${var.cluster_location}/gkeMemberships/${local.cluster_membership_id}" : "https://${module.gke_cluster.endpoint}" | ||
|
|
||
| } | ||
|
|
||
| provider "kubernetes" { | ||
| alias = "ai_starter_kit" | ||
| host = local.host | ||
| token = data.google_client_config.default.access_token | ||
| cluster_ca_certificate = var.private_cluster ? "" : base64decode(module.gke_cluster.ca_certificate) | ||
|
|
||
| dynamic "exec" { | ||
| for_each = var.private_cluster ? [1] : [] | ||
| content { | ||
| api_version = "client.authentication.k8s.io/v1beta1" | ||
| command = "gke-gcloud-auth-plugin" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| locals { | ||
| service_account_name = var.service_account_name != "" ? var.service_account_name : var.default_resource_name | ||
| } | ||
|
|
||
|
|
||
| module "ai_starter_kit_workload_identity" { | ||
| providers = { | ||
| kubernetes = kubernetes.ai_starter_kit | ||
| } | ||
| source = "terraform-google-modules/kubernetes-engine/google//modules/workload-identity" | ||
| name = local.service_account_name | ||
| namespace = "default" | ||
| roles = ["roles/storage.objectUser"] | ||
| project_id = var.project_id | ||
| depends_on = [module.gke_cluster] | ||
| } | ||
|
|
||
| provider "kubectl" { | ||
| alias = "ai_starter_kit" | ||
| apply_retry_count = 15 | ||
| host = local.host | ||
| token = data.google_client_config.default.access_token | ||
| cluster_ca_certificate = var.private_cluster ? "" : base64decode(module.gke_cluster.ca_certificate) | ||
| load_config_file = true | ||
|
|
||
| dynamic "exec" { | ||
| for_each = var.private_cluster ? [1] : [] | ||
| content { | ||
| api_version = "client.authentication.k8s.io/v1beta1" | ||
| command = "gke-gcloud-auth-plugin" | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
|
|
||
| output "gke_cluster_name" { | ||
| value = local.cluster_name | ||
| description = "GKE cluster name" | ||
| } | ||
|
|
||
| output "gke_cluster_location" { | ||
| value = var.cluster_location | ||
| description = "GKE cluster location" | ||
| } | ||
|
|
||
| output "project_id" { | ||
| value = var.project_id | ||
| description = "GKE cluster location" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| variable "project_id" { | ||
| type = string | ||
| } | ||
| variable "default_resource_name" { | ||
| type = string | ||
| } | ||
| variable "cluster_name" { | ||
| type = string | ||
| } | ||
| variable "cluster_location" { | ||
| type = string | ||
| } | ||
| variable "autopilot_cluster" { | ||
| type = bool | ||
| } | ||
| variable "private_cluster" { | ||
| type = bool | ||
| } | ||
| variable "cluster_membership_id" { | ||
| type = string | ||
| description = "require to use connectgateway for private clusters, default: cluster_name" | ||
| default = "" | ||
| } | ||
| variable "service_account_name" { | ||
| type = string | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| import sys | ||
| import requests | ||
| from packaging.version import Version as V | ||
|
|
||
|
|
||
| def test_hub_up(hub_url): | ||
| r = requests.get(hub_url) | ||
| r.raise_for_status() | ||
| print("JupyterHub up.") | ||
|
|
||
|
|
||
| def test_api_root(hub_url): | ||
| """ | ||
| Tests the hub api's root endpoint (/). The hub's version should be returned. | ||
|
|
||
| A typical jupyterhub logging response to this test: | ||
|
|
||
| [I 2019-09-25 12:03:12.051 JupyterHub log:174] 200 GET /hub/api ([email protected]) 9.57ms | ||
| """ | ||
| r = requests.get(hub_url + "/hub/api") | ||
| r.raise_for_status() | ||
| info = r.json() | ||
| version = info["version"] | ||
| assert V("4") <= V(version) <= V("5.5"), f"version {version} must be between 4 and 5.5" | ||
| print("JupyterHub Rest API is working.") | ||
|
|
||
|
|
||
| def test_hub_login(hub_url): | ||
| """ | ||
| Tests the hub dummy authenticator login credentials. Login credentials retrieve | ||
| from /jupyter_config/config.yaml. After successfully login, user will be | ||
| redirected to /hub/spawn. | ||
| """ | ||
| username, password = "user", "sneakypass" | ||
| session = requests.Session() | ||
|
|
||
| response = session.get(hub_url + "/hub/login") | ||
| response.raise_for_status() | ||
|
|
||
| auth_params = {} | ||
| if "_xsrf" in session.cookies: | ||
| auth_params = {"_xsrf": session.cookies["_xsrf"]} | ||
|
|
||
| response = session.post( | ||
| hub_url + "/hub/login", | ||
| params=auth_params, | ||
| data={"username": username, "password": password}, | ||
| allow_redirects=True, | ||
| ) | ||
| response.raise_for_status() | ||
| assert (hub_url + "/hub/spawn-pending/user") in response.url, f"unexpected response url: got {response.url}, expected {hub_url}/hub/spawn-pending/user" | ||
| print("JupyterHub login success.") | ||
|
|
||
|
|
||
| hub_url = "http://" + sys.argv[1] | ||
|
|
||
| test_hub_up(hub_url) | ||
| test_api_root(hub_url) | ||
| test_hub_login(hub_url) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of the make commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want me to document each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in general in README. User can still following the current README to install via helm, so not sure when these make commands should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented in commit: 78a03d7