Databricks Connection
This feature is in preview phase and is available in Dagster+ in limited early access. Functionality and APIs may change as we continue development. To get early access to this feature, reach out to your Dagster account team. For more information, see the API lifecycle stages documentation.
This guide covers connecting Dagster+ to Databricks Unity Catalog to automatically discover and sync catalog, schema, table, and view metadata.
Overview
To create a Databricks Connection in Dagster+, you will need to:
- Generate authentication credentials with appropriate permissions.
- Add the credentials as an environment variable in Dagster+.
- Create the Databricks Connection in Dagster+.
Step 1: Generate authentication credentials with appropriate permissions
Dagster Connections require read-only access to Databricks Unity Catalog metadata. We recommend using a dedicated service principal with OAuth machine-to-machine (M2M) credentials, but personal access tokens (PATs) are also supported.
Required API scopes
The Databricks credentials used by the connection must be able to access the following API scopes:
unity-catalog— Read catalog, schema, table, and view metadata from Unity Catalog, and data lineage between Unity Catalog objects.query-history— Read query history to surface usage information and derive table-level lineage.sql— Execute lineage queries against a SQL warehouse. Only required if you enable lineage tracking.
When creating a personal access token, select these scopes explicitly. Service principal OAuth credentials inherit scopes from the principal's permissions.
Option A: Service principal with OAuth (M2M) — recommended for production
Service principals provide more secure, auditable access without tying to a specific user account. Dagster authenticates to the service principal using OAuth machine-to-machine (M2M) credentials — a client ID and a client secret.
Step 1A.1: Create service principal
- In your Databricks workspace, navigate to Settings > Admin Console.
- Click Service principals in the left sidebar.
- Click Add service principal.
- Enter a name like
dagster-connection. - Click Add.
- Open the new service principal's details page and copy its Application ID (a UUID). You'll use this value both for Unity Catalog grants and for the connection's Client ID.
Step 1A.2: Grant Unity Catalog permissions
In Unity Catalog, grants to a service principal must use its Application ID, not its display name. Run the following against each catalog and schema you want to sync, replacing <application_id> with the UUID from the previous step:
-- Grant catalog access
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
-- Grant schema access
GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<application_id>`;
-- Grant read access to tables and views
GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO `<application_id>`;
Step 1A.3: Generate an OAuth secret for the service principal
- In the Admin Console, open your
dagster-connectionservice principal. - Navigate to the Secrets tab.
- Click Generate secret.
- Set a lifetime (or "no expiration" for long-term use).
- Copy the Secret — it will only be shown once. Combined with the Application ID from Step 1A.1, this gives you the Client ID and Client Secret used by the connection.
Option B: Personal access token
For simpler setups or development environments, you can use a PAT tied to your user account.
Step 1B.1: Ensure your user has required permissions
Your user account needs these Unity Catalog privileges:
USE CATALOGon target catalogsUSE SCHEMAon target schemasSELECTon tables and views
Step 1B.2: Create personal access token
- Click your username in the top-right corner of the Databricks workspace
- Select User Settings
- Navigate to the Developer tab
- Click Manage next to Access tokens
- Click Generate new token
- Enter a comment like "Dagster Connection"
- Set expiration (or leave blank for no expiration)
- Click Generate
- Copy the token immediately - it won't be shown again
Optional: Enable lineage tracking
To compute table lineage from Databricks query history, additional configuration is required regardless of which authentication option you chose:
- The service principal or user tied to the credentials used for the connection must have access to the metastore system tables. This typically requires one of:
- Metastore admin privileges, or
- Account admin privileges, which are required when the metastore was created by default at account creation time
- The warehouse ID of a SQL warehouse that the above identity can access must be provided when configuring the connection (see Step 3). Lineage queries are executed against this warehouse.
Step 2: Store credentials in Dagster+
-
In Dagster+, navigate to Deployment > Environment variables
-
Create a new environment variable to hold your credential:
- If you used Option A (service principal with M2M OAuth):
- Name:
DATABRICKS_CONNECTION_CLIENT_SECRET(or any name you prefer) - Value: Paste your OAuth secret
- Name:
- If you used Option B (PAT):
- Name:
DATABRICKS_CONNECTION_TOKEN(or any name you prefer) - Value: Paste your PAT
- Name:
- If you used Option A (service principal with M2M OAuth):
Step 3: Create the Databricks Connection
- In Dagster+, click Connections in the left sidebar
- Click Create Connection
- Select Databricks as the connection type
- Configure the connection details
Required fields
- Connection name: A unique name for this Connection (e.g.,
databricks_unity_catalog)- This will become the name of the code location containing synced assets
- Workspace URL: Your Databricks workspace URL
- Format:
https://dbc-1234abcd-56ef.cloud.databricks.com - Find this in your browser address bar when logged into Databricks
- Format:
- Authentication: Fill in the fields that match the method you chose in Step 1:
- If you used Option A (service principal with M2M OAuth):
- Client ID: The service principal's Application ID
- Client secret environment variable: Name of the Dagster+ environment variable containing your OAuth secret (e.g.,
DATABRICKS_CONNECTION_CLIENT_SECRET)
- If you used Option B (PAT):
- Personal access token environment variable: Name of the Dagster+ environment variable containing your PAT (e.g.,
DATABRICKS_CONNECTION_TOKEN)
- Personal access token environment variable: Name of the Dagster+ environment variable containing your PAT (e.g.,
- If you used Option A (service principal with M2M OAuth):
Optional: Warehouse ID for lineage tracking
- Warehouse ID: ID of the SQL warehouse used to execute lineage queries against the metastore system tables. Required to enable lineage tracking. For the permissions the connection identity must have, see Enable lineage tracking.
Optional: Configure asset filtering
Use filtering to control which catalogs, schemas, tables, views, and notebooks are synced. Patterns use regular expressions.