-
Notifications
You must be signed in to change notification settings - Fork 15
Add more instances and node groups to cover all templates by default #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
chrisfellowes-anyscale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just to confirm all of these additional pools are autoscaled with default min count 0 right?
Yes, I have tried out AWS stack for now, min counts are from 0. Have seen some issues on pod start up side (insufficient CPU and all) investigating those, will confirm for GCP as well |
- Making exception on gitignore for gpu_instances.tfvars
.gitignore
Outdated
| *.tfvars.json | ||
|
|
||
| # Allow gpu_instances.tfvars files (these contain example GPU configurations, not secrets) | ||
| !**/gpu_instances.tfvars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the gpu_instances.tfvars to gpu_instances.tfvars.example
examples/aws/eks-private/eks.tf
Outdated
| } | ||
| } | ||
| # Additional GPU types can be added via gpu_instances.tfvars | ||
| gpu_types = merge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do merge(default, additional).
Instead, just specify everything from the variable and this default into variable as well.
| default = ["T4"] | ||
| } | ||
|
|
||
| variable "additional_gpu_types" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call it gpu_instance_types.
Add this to the default value:
{
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.4xlarge"]
}
"A10G" = {
product_name = "NVIDIA-A10G"
instance_types = ["g5.4xlarge"]
}
},
- Remove merge on gpu_instances, use full replace
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull Request Type
https://github.com/anyscale/product/blob/master/backend/workspace-templates.yaml(on_gallery=true)Does this introduce a breaking change?
Terraform is working for both GKE and EKS (Both new clusters public/private)
Screenshots:
GKE Nodegroup (default):
GKE Nodegroup (After additional instances):
EKS Nodegroup (After additional instance):
