Skip to content

tomarv2/terraform-databricks-aws-workspace

Repository files navigation

❗️ Important

πŸ‘‰ This Terraform module assumes you have access to: https://accounts.cloud.databricks.com

πŸ‘‰ Databricks account username: databricks_account_username

πŸ‘‰ Databricks account password: databricks_account_password

πŸ‘‰ Databricks account id, databricks_account_id can be found on the bottom left corner of the page, once you're logged in.

πŸ‘‰ Part 2: Terraform module for Databricks Workspace management


Databricks deployment

Versions

  • Module tested for Terraform 1.0.1.
  • databrickslabs/databricks provider version 0.4.7
  • AWS provider version 3.47.
  • main branch: Provider versions not pinned to keep up with Terraform releases.
  • tags releases: Tags are pinned with versions (use ).

Usage

Option 1:

terrafrom init
terraform plan -var='teamid=tryme' -var='prjid=project1'
terraform apply -var='teamid=tryme' -var='prjid=project1'
terraform destroy -var='teamid=tryme' -var='prjid=project1'

Note: With this option please take care of remote state storage

Option 2:

Recommended method (stores remote state in S3 using prjid and teamid to create directory structure):

  • Create python 3.6+ virtual environment
python3 -m venv <venv name>
  • Install package:
pip install tfremote --upgrade
  • Set below environment variables:
export TF_AWS_BUCKET=<remote state bucket name>
export TF_AWS_BUCKET_REGION=us-west-2
export TF_AWS_PROFILE=<profile from ~/.ws/credentials>

or

  • Set below environment variables:
export TF_AWS_BUCKET=<remote state bucket name>
export TF_AWS_BUCKET_REGION=us-west-2
export AWS_ACCESS_KEY_ID=<aws_access_key_id>
export AWS_SECRET_ACCESS_KEY=<aws_secret_access_key>
  • Update main.tf file with required values.

  • Run and verify the output before deploying:

tf -c=aws plan -var='teamid=foo' -var='prjid=bar'
  • Run below to deploy:
tf -c=aws apply -var='teamid=foo' -var='prjid=bar'
  • Run below to destroy:
tf -c=aws destroy -var='teamid=foo' -var='prjid=bar'

NOTE:

Databricks workspace creation with new role

module "databricks_workspace" {
  source = "git::git@github.com:tomarv2/terraform-databricks-aws-workspace.git"

  # NOTE: One of the below is required:
  # - 'profile_for_iam' - for IAM creation (if none is provided 'default' is used)
  # - 'existing_role_name'
  profile_for_iam             = "iam-admin"

  databricks_account_username = "example@example.com"
  databricks_account_password = "sample123!"
  databricks_account_id       = "1234567-1234-1234-1234-1234567"
  # -----------------------------------------
  # Do not change the teamid, prjid once set.
  teamid = var.teamid
  prjid  = var.prjid
}

Databricks workspace creation with existing role

module "databricks_workspace" {
  source = "git::git@github.com:tomarv2/terraform-databricks-aws-workspace.git"

  # NOTE: One of the below is required:
  # - 'profile_for_iam' - for IAM creation (if none is provided 'default' is used)
  # - 'existing_role_name'
  existing_role_arn          = "arn:aws:iam::123456789012:role/demo-role"

  databricks_account_username = "example@example.com"
  databricks_account_password = "sample123!"
  databricks_account_id       = "1234567-1234-1234-1234-1234567"
  # -----------------------------------------
  # Do not change the teamid, prjid once set.
  teamid = var.teamid
  prjid  = var.prjid
}

Please refer to examples directory link for references.

Coming up:

Troubleshooting:

IAM policy error

If you notice below error:

Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances
  • Try creating workspace from UI:

create_workspace_error

  • Verify if the role and policy exists (assume role should allow external id)

iam_role_trust_error

Requirements

Name Version
terraform >= 1.0.1
aws ~> 3.63
databricks 0.5.1
random ~> 3.1
time ~> 0.7

Providers

Name Version
aws ~> 3.63
databricks 0.5.1
databricks.created_workspace 0.5.1
databricks.mws 0.5.1
random ~> 3.1
time ~> 0.7

Modules

Name Source Version
iam_policies git::git@github.com:tomarv2/terraform-aws-iam-policies.git v0.0.4
iam_role git::git@github.com:tomarv2/terraform-aws-iam-role.git//modules/iam_role_external v0.0.7
s3 git::git@github.com:tomarv2/terraform-aws-s3.git v0.0.8
vpc git::git@github.com:tomarv2/terraform-aws-vpc.git v0.0.6

Resources

Name Type
aws_s3_bucket_policy.root_bucket_policy resource
databricks_mws_credentials.this resource
databricks_mws_networks.this resource
databricks_mws_storage_configurations.this resource
databricks_mws_workspaces.this resource
databricks_token.pat resource
random_string.naming resource
time_sleep.wait resource
aws_region.current data source
databricks_aws_assume_role_policy.this data source
databricks_aws_bucket_policy.this data source
databricks_aws_crossaccount_policy.cross_account_iam_policy data source

Inputs

Name Description Type Default Required
cidr_block The CIDR block for the VPC string "10.4.0.0/16" no
custom_tags Extra custom tags any null no
databricks_account_id External ID provided by third party. string n/a yes
databricks_account_password databricks account password string n/a yes
databricks_account_username databricks account username string n/a yes
databricks_hostname databricks hostname string "https://accounts.cloud.databricks.com" no
existing_role_name If you want to use existing role name, else a new role will be created string null no
prjid Name of the project/stack e.g: mystack, nifieks, demoaci. Should not be changed after running 'tf apply' string n/a yes
profile profile to use for resource creation string "default" no
profile_for_iam profile to use for IAM string null no
region AWS region to deploy resources string "us-east-1" no
teamid Name of the team/group e.g. devops, dataengineering. Should not be changed after running 'tf apply' string n/a yes

Outputs

Name Description
databricks_credentials_id databricks credentials id
databricks_deployment_name databricks deployment name
databricks_host databricks hostname
databricks_mws_credentials_id databricks mws credentials id
databricks_mws_network_id databricks mws network id
databricks_mws_storage_bucket_name databricks mws storage bucket name
databricks_mws_storage_id databricks mws storage id
databricks_token Value of the newly created token
databricks_token_lifetime_hours Token validity
iam_role_arn iam role arn
inline_policy_id inline policy id
nonsensitive_databricks_token Value of the newly created token (nonsensitive)
s3_bucket_arn s3 bucket arn
s3_bucket_id s3 bucket id
s3_bucket_name s3 bucket name
storage_configuration_id databricks storage configuration id
vpc_id vpc id
vpc_route_table_ids list of VPC route tables IDs
vpc_security_group_id list of VPC security group ID
vpc_subnet_ids list of subnet ids within VPC
workspace_url databricks workspace url