GitHub - tomarv2/terraform-databricks-workspace-management: Terraform module for Databricks Workspace Management: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/workspace-management

Terraform module for Databricks Workspace Management (Part 2)

❗️ Important

👉 This module assumes you have Databricks Workspace AWS or Azure already deployed.

👉 Workspace URL

👉 DAPI Token

Versions

Module tested for Terraform 1.0.1.
databricks/databricks provider version 1.3.1
AWS provider version 4.14.
main branch: Provider versions not pinned to keep up with Terraform releases.
tags releases: Tags are pinned with versions (use ).

What this module does?

Deploy Cluster

This is where you would normally start with if you have just deployed your databricks workspace.

Two cluster modes are supported by this module:

Single Node mode: To deploy cluster in Single Node mode, update fixed_value to 0:

fixed_value         = 0

Standard mode: To deploy in Standard mode, two options are available:

fixed_value         = 1 or more

OR

auto_scaling         = [1,3]

Cluster ACL

Cluster can have one of these permissions: CAN_ATTACH_TO , CAN_RESTART and CAN_MANAGE.

cluster_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_RESTART"
  },
  {
    user_name       = "<user_name>"
    permission_level = "CAN_RESTART"
  }
]

Cluster Policy

To build cluster with new cluster policy, use:

deploy_cluster_policy = true
policy_overrides = {
  "dbus_per_hour" : {
    "type" : "range",
    "maxValue" : 10
  },
  "autotermination_minutes" : {
    "type" : "fixed",
    "value" : 30,
    "hidden" : true
  }
}

To use existing Cluster policy, specify the existing policy id:

cluster_policy_id = "E0123456789"

To get existing policy id use:

curl -X GET --header "Authorization: Bearer $DAPI_TOKEN"  https://<workspace_name>/api/2.0/policies/clusters/list \
--data '{ "sort_order": "DESC", "sort_column": "POLICY_CREATION_TIME" }'

Cluster Policy ACL

policy_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_USE"
  },
  {
    user_name       = "<user_name>"
    permission_level = "CAN_USE"
  }
]

Instance Pool

Note: To configure Instance Pool, add below configuration:

deploy_worker_instance_pool           = true
min_idle_instances                    = 1
max_capacity                          = 5
idle_instance_autotermination_minutes = 30

Instance Pool ACL

Instance pool can have one of these permissions: CAN_ATTACH_TO and CAN_MANAGE.

instance_pool_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_ATTACH_TO"
  },
  {
    user_name       = "<user_name>"
    permission_level = "CAN_ATTACH_TO"
  },
]

❗️ Important

If deploy_worker_instance_pool is set to true and auto_scaling is enabled. Ensure max_capacity of Cluster Instance Pool is more than auto_scaling max value for Cluster.

Deploy Job

Two options are available:

Deploy Job to an existing cluster.
Deploy new Cluster and then deploy Job.

Two options are available to attach notebooks to a job:

Attach existing notebook to a job.
Create new notebook and attach it to a job.

Jobs ACL

Job can have one of these permissions: CAN_VIEW, CAN_MANAGE_RUN, IS_OWNER, and CAN_MANAGE.

Admins have CAN_MANAGE permission by default, and they can assign that permission to non-admin users, and service principals.

Job creator has IS_OWNER permission. Destroying databricks_permissions resource for a job would revert ownership to the creator.

Note:

A job must have exactly one owner. If resource is changed and no owner is specified, currently authenticated principal would become new owner of the job.
A job cannot have a group as an owner.
Jobs triggered through Run Now assume the permissions of the job owner and not the user, and service principal who issued Run Now.

jobs_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_MANAGE_RUN"
  },
   {
    user_name       = "<user_name>"
    permission_level = "CAN_MANAGE_RUN"
  }
]

AWS only

Instance Profile

Add instance profile at cluster creation time. It can control which data a given cluster can access through cloud-native controls.

add_instance_profile_to_workspace = true (default false)
aws_attributes = {
    instance_profile_arn = "arn:aws:iam::123456789012:instance-profile/aws-instance-role"
}

Note: add_instance_profile_to_workspace to add Instance profile to Databricks workspace. To use existing set it to false.

Deploy Notebook

Put notebooks in notebooks folder and provide below information:

  notebooks = [
    {
      name       = "demo_notebook1"
      language   = "PYTHON"
      local_path = "notebooks/sample1.py"
      path       = "/Shared/demo/sample1.py"
    },
    {
      name       = "demo_notebook2"
      local_path = "notebooks/sample2.py"
    }
  ]

Notebook ACL

Notebook can have one of these permissions: CAN_READ, CAN_RUN, CAN_EDIT, and CAN_MANAGE.

notebooks_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_MANAGE"
  },
  {
    user_name       = "<user_name>"
    permission_level = "CAN_MANAGE"
  }
]

Deploy everything(cluster,job, and notebook):

Try this: If you want to test what resources are getting deployed.

Usage

Option 1:

terrafrom init
terraform plan -var='teamid=tryme' -var='prjid=project'
terraform apply -var='teamid=tryme' -var='prjid=project'
terraform destroy -var='teamid=tryme' -var='prjid=project'

Note: With this option please take care of remote state storage

Option 2:

Recommended method (store remote state in S3 using `prjid` and `teamid` to create directory structure):

Create python 3.8+ virtual environment

python3 -m venv <venv name>

Install package:

pip install tfremote

Set below environment variables based on cloud provider.
Updated examples directory with required values.

NOTE:

Read more on tfremote

Please refer to examples directory link for references.

Coming up

Helpful links

Databricks Sync - Tool for multi cloud migrations, DR sync of workspaces. It uses TF in the backend. Run it from command line or from a notebook.
Databricks Migrate - Tool to migrate a workspace(One time tool).
Databricks CICD Templates

Troubleshooting

If you see error messages. Try running the same the command again.

Error: Failed to delete token in Scope <scope name>

Error: Scope <scope name> does not exist!

Requirements

Name	Version
terraform	>= 1.0.1
aws	>= 4.14
databricks	>= 0.5.7

Providers

Name	Version
databricks	>= 0.5.7

Modules

No modules.

Resources

Name	Type
databricks_cluster.cluster	resource
databricks_cluster_policy.this	resource
databricks_group.this	resource
databricks_group_member.group_members	resource
databricks_instance_pool.driver_instance_nodes	resource
databricks_instance_pool.worker_instance_nodes	resource
databricks_instance_profile.shared	resource
databricks_job.existing_cluster_new_job_existing_notebooks	resource
databricks_job.existing_cluster_new_job_new_notebooks	resource
databricks_job.new_cluster_new_job_existing_notebooks	resource
databricks_job.new_cluster_new_job_new_notebooks	resource
databricks_library.maven	resource
databricks_library.python_wheel	resource
databricks_notebook.notebook_file	resource
databricks_notebook.notebook_file_deployment	resource
databricks_permissions.cluster	resource
databricks_permissions.driver_pool	resource
databricks_permissions.existing_cluster_new_job_existing_notebooks	resource
databricks_permissions.existing_cluster_new_job_new_notebooks	resource
databricks_permissions.jobs_notebook	resource
databricks_permissions.new_cluster_new_job_existing_notebooks	resource
databricks_permissions.new_cluster_new_job_new_notebooks	resource
databricks_permissions.notebook	resource
databricks_permissions.policy	resource
databricks_permissions.worker_pool	resource
databricks_secret_acl.spectators	resource
databricks_user.users	resource
databricks_current_user.me	data source
databricks_node_type.cluster_node_type	data source
databricks_spark_version.latest	data source

Inputs

Name	Description	Type	Default	Required
add_instance_profile_to_workspace	Existing AWS instance profile ARN	`bool`	`false`	no
allow_cluster_create	This is a field to allow the group to have cluster create privileges. More fine grained permissions could be assigned with databricks_permissions and cluster_id argument. Everyone without allow_cluster_create argument set, but with permission to use Cluster Policy would be able to create clusters, but within boundaries of that specific policy.	`bool`	`true`	no
allow_instance_pool_create	This is a field to allow the group to have instance pool create privileges. More fine grained permissions could be assigned with databricks_permissions and instance_pool_id argument.	`bool`	`true`	no
always_running	Whenever the job is always running, like a Spark Streaming application, on every update restart the current active run or start it again, if nothing it is not running. False by default.	`bool`	`false`	no
auto_scaling	Number of min and max workers in auto scale.	`list(any)`	`null`	no
aws_attributes	Optional configuration block contains attributes related to clusters running on AWS.	`any`	`null`	no
azure_attributes	Optional configuration block contains attributes related to clusters running on Azure.	`any`	`null`	no
category	Node category, which can be one of: General purpose, Memory optimized, Storage optimized, Compute optimized, GPU	`string`	`"General purpose"`	no
cluster_access_control	Cluster access control	`any`	`null`	no
cluster_autotermination_minutes	cluster auto termination duration	`number`	`30`	no
cluster_id	Existing cluster id	`string`	`null`	no
cluster_name	Cluster name	`string`	`null`	no
cluster_policy_id	Exiting cluster policy id	`string`	`null`	no
create_group	Create a new group, if group already exists the deployment will fail.	`bool`	`false`	no
create_user	Create a new user, if user already exists the deployment will fail.	`bool`	`false`	no
custom_tags	Extra custom tags	`any`	`null`	no
data_security_mode	Access mode	`string`	`"NONE"`	no
databricks_username	User allowed to access the platform.	`string`	`""`	no
deploy_cluster	feature flag, true or false	`bool`	`false`	no
deploy_cluster_policy	feature flag, true or false	`bool`	`false`	no
deploy_driver_instance_pool	Driver instance pool	`bool`	`false`	no
deploy_job_cluster	feature flag, true or false	`bool`	`false`	no
deploy_jobs	feature flag, true or false	`bool`	`false`	no
deploy_worker_instance_pool	Worker instance pool	`bool`	`false`	no
driver_node_type_id	The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id.	`string`	`null`	no
email_notifications	Email notification block.	`any`	`null`	no
fixed_value	Number of nodes in the cluster.	`number`	`0`	no
gb_per_core	Number of gigabytes per core available on instance. Conflicts with min_memory_gb. Defaults to 0.	`string`	`0`	no
gcp_attributes	Optional configuration block contains attributes related to clusters running on GCP.	`any`	`null`	no
gpu	GPU required or not.	`bool`	`false`	no
idle_instance_autotermination_minutes	idle instance auto termination duration	`number`	`20`	no
instance_pool_access_control	Instance pool access control	`any`	`null`	no
jobs_access_control	Jobs access control	`any`	`null`	no
libraries	Installs a library on databricks_cluster	`map(any)`	`{}`	no
local_disk	Pick only nodes with local storage. Defaults to false.	`string`	`true`	no
local_notebooks	Local path to the notebook(s) that will be used by the job	`any`	`[]`	no
max_capacity	instance pool maximum capacity	`number`	`3`	no
max_concurrent_runs	An optional maximum allowed number of concurrent runs of the job.	`number`	`null`	no
max_retries	An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry.	`number`	`0`	no
min_cores	Minimum number of CPU cores available on instance. Defaults to 0.	`string`	`0`	no
min_gpus	Minimum number of GPU's attached to instance. Defaults to 0.	`string`	`0`	no
min_idle_instances	instance pool minimum idle instances	`number`	`1`	no
min_memory_gb	Minimum amount of memory per node in gigabytes. Defaults to 0.	`string`	`0`	no
min_retry_interval_millis	An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.	`number`	`null`	no
ml	ML required or not.	`bool`	`false`	no
notebooks	Local path to the notebook(s) that will be deployed	`any`	`[]`	no
notebooks_access_control	Notebook access control	`any`	`null`	no
policy_access_control	Policy access control	`any`	`null`	no
policy_overrides	Cluster policy overrides	`any`	`null`	no
prjid	(Required) Name of the project/stack e.g: mystack, nifieks, demoaci. Should not be changed after running 'tf apply'	`string`	n/a	yes
remote_notebooks	Path to notebook(s) in the databricks workspace that will be used by the job	`any`	`[]`	no
retry_on_timeout	An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.	`bool`	`false`	no
schedule	Job schedule configuration.	`map(any)`	`null`	no
spark_conf	Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.	`any`	`null`	no
spark_env_vars	Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.	`any`	`null`	no
spark_version	Runtime version of the cluster. Any supported databricks_spark_version id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.	`string`	`null`	no
task_parameters	Base parameters to be used for each run of this job.	`map(any)`	`{}`	no
teamid	(Required) Name of the team/group e.g. devops, dataengineering. Should not be changed after running 'tf apply'	`string`	n/a	yes
timeout	An optional timeout applied to each run of this job. The default behavior is to have no timeout.	`number`	`null`	no
worker_node_type_id	The node type of the Spark worker.	`string`	`null`	no

Outputs

Name	Description
cluster_id	databricks cluster id
cluster_name	databricks cluster name
cluster_policy_id	databricks cluster policy permissions
databricks_group	databricks group name
databricks_group_member	databricks group members
databricks_secret_acl	databricks secret acl
databricks_user	databricks user name
databricks_user_id	databricks user id
existing_cluster_new_job_existing_notebooks_id	databricks new cluster job id
existing_cluster_new_job_existing_notebooks_job	databricks new cluster job url
existing_cluster_new_job_new_notebooks_id	databricks new cluster job id
existing_cluster_new_job_new_notebooks_job	databricks new cluster job url
instance_profile	databricks instance profile ARN
new_cluster_new_job_existing_notebooks_id	databricks job id
new_cluster_new_job_existing_notebooks_job	databricks job url
new_cluster_new_job_new_notebooks_id	databricks job id
new_cluster_new_job_new_notebooks_job	databricks job url
notebook_url	databricks notebook url
notebook_url_standalone	databricks notebook url standalone

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github/workflows		.github/workflows
docs/images		docs/images
examples		examples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
aws_instance_profile.tf		aws_instance_profile.tf
cluster.tf		cluster.tf
cluster_instance_pool.tf		cluster_instance_pool.tf
cluster_node_type.tf		cluster_node_type.tf
cluster_policy.tf		cluster_policy.tf
job.tf		job.tf
main.tf		main.tf
notebook.tf		notebook.tf
outputs.tf		outputs.tf
pat.tf		pat.tf
permissions.tf		permissions.tf
providers.tf		providers.tf
variables.tf		variables.tf
workspace-security.tf		workspace-security.tf

License

tomarv2/terraform-databricks-workspace-management

Folders and files

Latest commit

History

Repository files navigation

Terraform module for Databricks Workspace Management (Part 2)

Versions

What this module does?

Cluster Policy ACL

AWS only

Notebook ACL

Usage

Option 1:

Option 2:

Recommended method (store remote state in S3 using prjid and teamid to create directory structure):

Coming up

Helpful links

Troubleshooting

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Recommended method (store remote state in S3 using `prjid` and `teamid` to create directory structure):