This repository contains practical examples of using Terraform to deploy and manage Databricks workspaces and resources across multiple cloud providers. It is intended as a reference for infrastructure engineers, data engineers, and anyone interested in automating Databricks deployments using Infrastructure as Code (IaC) principles.
You can find more comprehensive example of Databricks Official Terraform examples. This repository is meant as the lighter version of it, providing you with quick commonly used workspace configuration.
NOTE: This guide assumes you have some prior knowledge about Cloud and Terraform. If you are quite new about all this, you may want to follow the Complete Beginner's Guide for a more guided approach.
The repository is organized by provider, with each provider having its own directory:
aws/– For deploying Databricks on Amazon Web Servicesazure/– For deploying Databricks on Microsoft Azuregcp/– For deploying Databricks on Google Cloud Platform
-
Clone the Repository
git clone https://github.com/work-apradana/terraform-databricks-examples.git cd terraform-databricks-examples -
Choose Your Cloud Provider
- Navigate to the folder for your target cloud:
aws/,azure/, orgcp/.
- Navigate to the folder for your target cloud:
-
Review and Customize
- Review the example Terraform files.
- Copy the folder or files to your own project directory.
- Edit variable values as needed (see
variables.tfand example.tfvarsfiles).
-
Initialize and Apply
- Initialize Terraform:
terraform init
- (Optional) Review the execution plan:
terraform plan -var-file="sample.terraform.tfvars" - Apply the configuration:
terraform apply -var-file="sample.terraform.tfvars"
- Initialize Terraform:
-
Clean Up
- To destroy the resources created by the example:
terraform destroy -var-file="sample.terraform.tfvars"
- To destroy the resources created by the example:
- Terraform Installation: Look into Teraform installation instructions for terraform installation.
- Credentials & Authentication: In this example, User-2-Machine authentication is used for Databricks, AWS, and Azure. Meaning that you will need to setup cli and login using
databricks auth login,aws sso login, oraz login. This is to minimize presence of long term credentials such as service account/principal. For GCP, usuallygcloud auth application-default loginis used but since it is blocked by my organizational policy, service-account file is used instead. Feel free the application default login method if it suits you. - Sensitive Data: Do not commit sensitive information (such as secrets or cloud credentials) to version control. Use
.tfvarsfiles and reference them locally. - Cost: Note that the resources used in these examples are not all within Cloud's Free Tier limit. Most cost will come from networking component such as NAT gateway and data egress.
- Customization: These examples are intended as starting points. Adapt them to fit your needs and best practices.
Contributions, improvements, and new examples are welcome! Please open an issue or submit a pull request.
This repository is provided for educational purposes and is licensed under the MIT License.