Skip to content

Conversation

@sridharkonduri-99
Copy link

@sridharkonduri-99 sridharkonduri-99 commented Nov 26, 2025

Pull request checklist

Please check if your PR fulfills the following requirements:

  • pre-commit has been run
  • Tests for the changes have been added (for bug fixes / features)
  • All tests passing
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)

Pull Request Type

  • Bugfix
  • New feature
  • Refactoring (no functional changes)
  • Documentation change
  • Other : Adding more node groups to cover all templates in according to config files here
    https://github.com/anyscale/product/blob/master/backend/workspace-templates.yaml (on_gallery=true)

Does this introduce a breaking change?

  • Yes
  • No

Terraform is working for both GKE and EKS (Both new clusters public/private)

Screenshots:
GKE Nodegroup (default):

Screenshot 2025-12-04 at 9 57 11 AM

GKE Nodegroup (After additional instances):

Screenshot 2025-12-02 at 4 10 09 PM

EKS Nodegroup (After additional instance):
Screenshot 2025-12-02 at 4 00 19 PM

@github-actions github-actions bot added documentation Improvements or additions to documentation examples labels Nov 26, 2025
Copy link
Contributor

@chrisfellowes-anyscale chrisfellowes-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just to confirm all of these additional pools are autoscaled with default min count 0 right?

@sridharkonduri-99
Copy link
Author

lgtm, just to confirm all of these additional pools are autoscaled with default min count 0 right?

Yes, I have tried out AWS stack for now, min counts are from 0. Have seen some issues on pod start up side (insufficient CPU and all) investigating those, will confirm for GCP as well

@sridharkonduri-99 sridharkonduri-99 marked this pull request as ready for review December 2, 2025 10:42
@sridharkonduri-99 sridharkonduri-99 requested a review from a team as a code owner December 2, 2025 10:42
@sridharkonduri-99 sridharkonduri-99 changed the title WIP | Add more instances and node groups to cover all templates by default Add more instances and node groups to cover all templates by default Dec 2, 2025
- Making exception on gitignore for gpu_instances.tfvars
.gitignore Outdated
*.tfvars.json

# Allow gpu_instances.tfvars files (these contain example GPU configurations, not secrets)
!**/gpu_instances.tfvars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the gpu_instances.tfvars to gpu_instances.tfvars.example

}
}
# Additional GPU types can be added via gpu_instances.tfvars
gpu_types = merge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do merge(default, additional).
Instead, just specify everything from the variable and this default into variable as well.

default = ["T4"]
}

variable "additional_gpu_types" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call it gpu_instance_types.
Add this to the default value:

    {
      "T4" = {
        product_name   = "Tesla-T4"
        instance_types = ["g4dn.4xlarge"]
      }
      "A10G" = {
        product_name   = "NVIDIA-A10G"
        instance_types = ["g5.4xlarge"]
      }
    },

- Remove merge on gpu_instances, use full replace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation examples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants