dagster.yaml reference
The dagster.yaml file is used to configure the Dagster instance. It defines various settings for storage, run execution, logging, and other aspects of a Dagster deployment.
File location
By default, Dagster looks for the dagster.yaml file in the directory specified by the DAGSTER_HOME environment variable.
Using environment variables
You can use environment variables to override values in the dagster.yaml file.
instance:
  module: dagster.core.instance
  class: DagsterInstance
  config:
    some_key:
      env: ME_ENV_VAR
Full configuration specification
Here's a comprehensive dagster.yaml specification with all available options:
local_artifact_storage:
  module: dagster.core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: '/path/to/dir'
compute_logs:
  module: dagster.core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /path/to/compute/logs
# Alternatively, logs can be written to cloud storage providers like S3, GCS, and Azure blog storage. For example:
# compute_logs:
#   module: dagster_aws.s3.compute_log_manager
#   class: S3ComputeLogManager
#   config:
#     bucket: "mycorp-dagster-compute-logs"
#    prefix: "dagster-test-"
storage:
  sqlite:
    base_dir: /path/to/sqlite/storage
  # Or use postgres:
  # postgres:
  #   postgres_db:
  #     hostname: localhost
  #     username: dagster
  #     password: dagster
  #     db_name: dagster
  #     port: 5432
run_queue:
  max_concurrent_runs: 15
  tag_concurrency_limits:
    - key: 'database'
      value: 'redshift'
      limit: 4
    - key: 'dagster/backfill'
      limit: 10
run_storage:
  module: dagster.core.storage.runs
  class: SqliteRunStorage
  config:
    base_dir: /path/to/dagster/home/history
event_log_storage:
  module: dagster.core.storage.event_log
  class: SqliteEventLogStorage
  config:
    base_dir: /path/to/dagster/home/history
schedule_storage:
  module: dagster.core.storage.schedules
  class: SqliteScheduleStorage
  config:
    base_dir: /path/to/dagster/home/schedules
scheduler:
  module: dagster.core.scheduler
  class: DagsterDaemonScheduler
run_coordinator:
  module: dagster.core.run_coordinator
  class: DefaultRunCoordinator
  # class: QueuedRunCoordinator
  # config:
  #   max_concurrent_runs: 25
  #   tag_concurrency_limits:
  #     - key: "dagster/backfill"
  #       value: "true"
  #       limit: 1
run_launcher:
  module: dagster.core.launcher
  class: DefaultRunLauncher
  # module: dagster_docker
  # class: DockerRunLauncher
  # module: dagster_k8s.launcher
  # class: K8sRunLauncher
  # config:
  #   service_account_name: pipeline_run_service_account
  #   job_image: my_project/dagster_image:latest
  #   instance_config_map: dagster-instance
  #   postgres_password_secret: dagster-postgresql-secret
telemetry:
  enabled: true
run_monitoring:
  enabled: true
  poll_interval_seconds: 60
run_retries:
  enabled: true
  max_retries: 3
  retry_on_asset_or_op_failure: true
code_servers:
  local_startup_timeout: 360
secrets:
  my_secret:
    env: MY_SECRET_ENV_VAR
retention:
  schedule:
    purge_after_days: 90
  sensor:
    purge_after_days:
      skipped: 7
      failure: 30
      success: -1
sensors:
  use_threads: true
  num_workers: 8
schedules:
  use_threads: true
  num_workers: 8
auto_materialize:
  enabled: true
  minimum_interval_seconds: 3600
  run_tags:
    key: 'value'
  respect_materialization_data_versions: true
  max_tick_retries: 3
  use_sensors: false
  use_threads: false
  num_workers: 4
Configuration options
storage
Configures how job and asset history is persisted.
storage:
  sqlite:
    base_dir: /path/to/sqlite/storage
Options:
- sqlite: Use SQLite for storage
- postgres: Use PostgreSQL for storage (requires- dagster-postgreslibrary)
- mysql: Use MySQL for storage (requires- dagster-mysqllibrary)
run_launcher
Determines where runs are executed.
run_launcher:
  module: dagster.core.launcher
  class: DefaultRunLauncher
Options:
- DefaultRunLauncher: Spawns a new process on the same node as the job's code location
- DockerRunLauncher: Allocates a Docker container per run
- K8sRunLauncher: Allocates a Kubernetes job per run
run_coordinator
Sets prioritization rules and concurrency limits for runs.
run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    max_concurrent_runs: 25
Options:
- DefaultRunCoordinator: Immediately sends runs to the run launcher
- QueuedRunCoordinator: Allows setting limits on concurrent runs
compute_logs
Controls the capture and persistence of stdout and stderr logs.
compute_logs:
  module: dagster.core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /path/to/compute/logs
Options:
- LocalComputeLogManager: Writes logs to disk
- NoOpComputeLogManager: Does not store logs
- AzureBlobComputeLogManager: Writes logs to Azure Blob Storage
- GCSComputeLogManager: Writes logs to Google Cloud Storage
- S3ComputeLogManager: Writes logs to AWS S3
local_artifact_storage
Configures storage for artifacts that require a local disk or when using the filesystem I/O manager.
local_artifact_storage:
  module: dagster.core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: /path/to/artifact/storage
telemetry
Controls whether Dagster collects anonymized usage statistics.
telemetry:
  enabled: false
code_servers
Configures how Dagster loads code in a code location.
code_servers:
  local_startup_timeout: 360
retention
Configures how long Dagster retains certain types of data.
retention:
  schedule:
    purge_after_days: 90
  sensor:
    purge_after_days:
      skipped: 7
      failure: 30
      success: -1
sensors
Configures how sensors are evaluated.
sensors:
  use_threads: true
  num_workers: 8
schedules
Configures how schedules are evaluated.
schedules:
  use_threads: true
  num_workers: 8
auto_materialize
Configures automatic materialization of assets.
auto_materialize:
  enabled: true
  minimum_interval_seconds: 3600
  run_tags:
    key: 'value'
  respect_materialization_data_versions: true
  max_tick_retries: 3
  use_sensors: false
  use_threads: false
  num_workers: 4
Options:
- enabled: Whether auto-materialization is enabled (boolean)
- minimum_interval_seconds: Minimum interval between materializations (integer)
- run_tags: Tags to apply to auto-materialization runs (dictionary)
- respect_materialization_data_versions: Whether to respect data versions when materializing (boolean)
- max_tick_retries: Maximum number of retries for each auto-materialize tick that raises an error (integer, default: 3)
- use_sensors: Whether to use sensors for auto-materialization (boolean)
- use_threads: Whether to use threads for processing ticks (boolean, default: false)
- num_workers: Number of threads to use for processing ticks from multiple automation policy sensors in parallel (integer)
concurrency
Configures default concurrency limits for operations.
concurrency:
  default_op_concurrency_limit: 10
Options:
- default_op_concurrency_limit: The default maximum number of concurrent operations for an unconfigured concurrency key (integer)