Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ crash.*.log
*.tfvars
*.tfvars.json

# Allow gpu_instances.tfvars files (these contain example GPU configurations, not secrets)
!**/gpu_instances.tfvars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the gpu_instances.tfvars to gpu_instances.tfvars.example


# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
Expand Down
11 changes: 11 additions & 0 deletions examples/aws/eks-private/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,17 @@ terraform apply
If you are using a `tfvars` file, you will need to update the above commands accordingly.
Note the output from Terraform which includes an example cloud registration command you will use below.

#### Using Additional GPU Instance Types

To enable additional GPU instance types beyond the defaults (T4, A10G), use the provided `gpu_instances.tfvars` file:

```shell
terraform plan -var-file="gpu_instances.tfvars"
terraform apply -var-file="gpu_instances.tfvars"
```

This will enable additional GPU types including T4-4x, L4, and L4-4x. You can also customize which GPU types to enable by modifying the `node_group_gpu_types` variable in the tfvars file.

### Install the Kubernetes Requirements

The Anyscale Operator requires the following components:
Expand Down
24 changes: 14 additions & 10 deletions examples/aws/eks-private/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,20 @@ locals {
)

# Map of GPU types to their product names and instance types
gpu_types = {
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.4xlarge"]
}
"A10G" = {
product_name = "NVIDIA-A10G"
instance_types = ["g5.4xlarge"]
}
}
# Additional GPU types can be added via gpu_instances.tfvars
gpu_types = merge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do merge(default, additional).
Instead, just specify everything from the variable and this default into variable as well.

{
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.4xlarge"]
}
"A10G" = {
product_name = "NVIDIA-A10G"
instance_types = ["g5.4xlarge"]
}
},
var.additional_gpu_types
)

# Base configuration for GPU node groups
gpu_node_group_base = {
Expand Down
38 changes: 38 additions & 0 deletions examples/aws/eks-private/gpu_instances.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# GPU Instance Types for EKS
#
# This file contains additional GPU instance configurations that can be used
# with the EKS cluster. To use these configurations, include this file when
# running terraform:
#
# terraform plan -var-file="gpu_instances.tfvars"
# terraform apply -var-file="gpu_instances.tfvars"
#
# You can also selectively enable specific GPU types by setting the
# node_group_gpu_types variable.
#
# Note: Entries here will override defaults with the same key (e.g., T4 below
# overrides the default T4 to add more instance types).

# GPU types - overrides defaults and adds new types
additional_gpu_types = {
# Override default T4 to include additional instance types
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge"]
}
"T4-4x" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.12xlarge"]
}
"L4" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.2xlarge", "g6.4xlarge"]
}
"L4-4x" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.24xlarge"]
}
}

# Enable all GPU types (default A10G plus types defined above)
node_group_gpu_types = ["T4", "A10G", "T4-4x", "L4", "L4-4x"]
31 changes: 30 additions & 1 deletion examples/aws/eks-private/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,41 @@ variable "eks_cluster_version" {
variable "node_group_gpu_types" {
description = <<-EOT
(Optional) The GPU types of the EKS nodes.
Possible values: ["T4", "A10G"]
Possible values: ["T4", "A10G"] plus any keys defined in additional_gpu_types
EOT
type = list(string)
default = ["T4"]
}

variable "additional_gpu_types" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call it gpu_instance_types.
Add this to the default value:

    {
      "T4" = {
        product_name   = "Tesla-T4"
        instance_types = ["g4dn.4xlarge"]
      }
      "A10G" = {
        product_name   = "NVIDIA-A10G"
        instance_types = ["g5.4xlarge"]
      }
    },

description = <<-EOT
(Optional) Additional GPU types to add or override in the EKS cluster.
Entries with the same key as a default (e.g., "T4") will override the default entirely.
See gpu_instances.tfvars for examples.

ex:
```
additional_gpu_types = {
# Override default T4 with more instance types
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge"]
}
# Add new GPU type
"L4" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.2xlarge", "g6.4xlarge"]
}
}
```
EOT
type = map(object({
product_name = string
instance_types = list(string)
}))
default = {}
}

variable "enable_efs" {
description = <<-EOT
(Optional) Enable the creation of an EFS instance.
Expand Down
11 changes: 11 additions & 0 deletions examples/aws/eks-public/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,17 @@ terraform apply
If you are using a `tfvars` file, you will need to update the above commands accordingly.
Note the output from Terraform which includes an example cloud registration command you will use below.

#### Using Additional GPU Instance Types

To enable additional GPU instance types beyond the defaults (T4, A10G), use the provided `gpu_instances.tfvars` file:

```shell
terraform plan -var-file="gpu_instances.tfvars"
terraform apply -var-file="gpu_instances.tfvars"
```

This will enable additional GPU types including T4-4x, L4, and L4-4x. You can also customize which GPU types to enable by modifying the `node_group_gpu_types` variable in the tfvars file.

### Install the Kubernetes Requirements

The Anyscale Operator requires the following components:
Expand Down
24 changes: 14 additions & 10 deletions examples/aws/eks-public/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,20 @@ locals {
)

# Map of GPU types to their product names and instance types
gpu_types = {
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.4xlarge"]
}
"A10G" = {
product_name = "NVIDIA-A10G"
instance_types = ["g5.4xlarge"]
}
}
# Additional GPU types can be added via gpu_instances.tfvars
gpu_types = merge(
{
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.4xlarge"]
}
"A10G" = {
product_name = "NVIDIA-A10G"
instance_types = ["g5.4xlarge"]
}
},
var.additional_gpu_types
)

# Base configuration for GPU node groups
gpu_node_group_base = {
Expand Down
38 changes: 38 additions & 0 deletions examples/aws/eks-public/gpu_instances.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# GPU Instance Types for EKS
#
# This file contains additional GPU instance configurations that can be used
# with the EKS cluster. To use these configurations, include this file when
# running terraform:
#
# terraform plan -var-file="gpu_instances.tfvars"
# terraform apply -var-file="gpu_instances.tfvars"
#
# You can also selectively enable specific GPU types by setting the
# node_group_gpu_types variable.
#
# Note: Entries here will override defaults with the same key (e.g., T4 below
# overrides the default T4 to add more instance types).

# GPU types - overrides defaults and adds new types
additional_gpu_types = {
# Override default T4 to include additional instance types
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge"]
}
"T4-4x" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.12xlarge"]
}
"L4" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.2xlarge", "g6.4xlarge"]
}
"L4-4x" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.24xlarge"]
}
}

# Enable all GPU types (default A10G plus types defined above)
node_group_gpu_types = ["T4", "A10G", "T4-4x", "L4", "L4-4x"]
31 changes: 30 additions & 1 deletion examples/aws/eks-public/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,41 @@ variable "eks_cluster_version" {
variable "node_group_gpu_types" {
description = <<-EOT
(Optional) The GPU types of the EKS nodes.
Possible values: ["T4", "A10G"]
Possible values: ["T4", "A10G"] plus any keys defined in additional_gpu_types
EOT
type = list(string)
default = ["T4"]
}

variable "additional_gpu_types" {
description = <<-EOT
(Optional) Additional GPU types to add or override in the EKS cluster.
Entries with the same key as a default (e.g., "T4") will override the default entirely.
See gpu_instances.tfvars for examples.

ex:
```
additional_gpu_types = {
# Override default T4 with more instance types
"T4" = {
product_name = "Tesla-T4"
instance_types = ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge"]
}
# Add new GPU type
"L4" = {
product_name = "NVIDIA-L4"
instance_types = ["g6.2xlarge", "g6.4xlarge"]
}
}
```
EOT
type = map(object({
product_name = string
instance_types = list(string)
}))
default = {}
}

variable "enable_efs" {
description = <<-EOT
(Optional) Enable the creation of an EFS instance.
Expand Down
11 changes: 11 additions & 0 deletions examples/gcp/gke-new_cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,17 @@ Steps for deploying Anyscale resources via Terraform:
If you are using a `tfvars` file, you will need to update the above commands accordingly.
Note the output from Terraform which includes an example cloud registration command you will use below.

#### Using Additional GPU Instance Types

To enable additional GPU instance types beyond the defaults (V100, P100, T4, L4, A100-40G, A100-80G, H100, H100-MEGA), use the provided `gpu_instances.tfvars` file:

```shell
terraform plan -var-file="gpu_instances.tfvars"
terraform apply -var-file="gpu_instances.tfvars"
```

This will enable additional GPU types including T4-lowcpu, T4-highcpu, T4-4x, L4-medium, and L4-4x. You can also customize which GPU types to enable by modifying the `node_group_gpu_types` variable in the tfvars file.

### Install the Kubernetes Requirements

The Anyscale Operator requires the following components:
Expand Down
Loading