Photo by Iqram-O-dowla Shawon / Unsplash

Unlocking Data Transformation Magic with dbt-core, ClickHouse, and Dagster

Data Dec 25, 2023

In this blog post, we embark on a journey to explore the seamless integration of dbt-core with Dagster, complemented by the powerful analytics capabilities of ClickHouse. Let's dive in and witness the synergy of these tools in action.

Introduction

At the heart of our data transformation endeavors lies dbt Core, a versatile tool that empowers data teams to wield analytics engineering best practices with ease. Coupled with dagster, an orchestration platform tailored for data asset development and observation, and ClickHouse, an open-source column-oriented database management system.

Minimum Software Requirements

Ensure you have the following components installed to get started.

Installations

Install the dbt ClickHouse plugin.

pip install dbt-clickhouse

Install the dagster-dbt library.

pip install dagster-dbt dagster-webserver

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Initialize the dbt project and configure it to interface with ClickHouse.

dbt init dbt_data_practitioner 

dbt1-1

Navigate to Your dbt Project Directory

cd dbt_data_practitioner
touch profiles.yml

Add ClickHouse configuration details to profiles.yml. Make sure all dependencies are correctly set up in your profiles.yml file for your database connections.

dbt_data_practitioner:
  target: dev
  outputs:
    dev:
      type: clickhouse
      schema: sakila_db
      host: localhost
      port: 8123
      user: default
      password: root
      secure: False

Running dbt debug provides valuable insights into your dbt project's configuration and environment, ensuring everything is set up correctly before diving into data transformations. When you run dbt debug, it performs a series of checks and validations, including: Configuration Validation, Connection Testing, Adapter Information, Environment Information.

By running dbt debug as part of your setup process, you can catch any configuration errors or connectivity issues early on, ensuring a smoother experience when running dbt commands and executing data transformations.

dbt debug

dbt2

After executing this command, carefully review the output to ensure that everything is configured correctly and that dbt can successfully connect to your ClickHouse instance.

dbt docs

Generate documentation for the dbt project:

dbt docs generate

This command generates the documentation artifacts for your project. It creates a manifest.json and catalog.json file in the target directory.

dbt3

To view the documentation, you need to serve it locally using the following command:

dbt docs serve

dbt4

By default, this will start a local server at http://localhost:8080 where you can view the documentation.

2525

models

Create models and define transformations:

cd models  
mkdir sakila_db  
cd sakila_db  
touch actor_film_actor_join.sql
touch point_of_interest_1.sql 

Delete the examples folder present inside the models folder.

cd ..
cd .. 

Build the project:

The following command runs a series of tasks to build your models, including running models, tests, and snapshots. This command essentially combines the functionality of dbt run, dbt test, and dbt snapshot.

The dbt build command will perform the following tasks:

  1. Run Models: Executes SQL scripts to create views or tables based on your model definitions.
  2. Run Tests: Executes tests defined in your project to validate data quality and integrity.
  3. Run Snapshots: Executes snapshot scripts to capture and store historical data.
dbt build

dbt5

The tables and views defined are now generated in ClickHouse DB.

dbt docs generate
dbt docs serve

Lineage Graph and other details.

25252

Dagster Integration

Now, let's integrate Dagster into our data transformation pipeline:

cd dbt_data_practitioner
dagster-dbt project scaffold --project-name dagster_data_practitioner

dag1

cd dagster_data_practitioner
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev

dag2-1

To access the Dagster UI from your browser, navigate to: http://127.0.0.1:3000

Dagster UI

dag3

Click on the black "Materialize all" button to materialize the transformations.

dag4

dag5

Sample Project

DataPractitioner is the sample project i've used to illustrate the usage of the aforementioned tools.

Noticed an issue with this Sample Project? Open an issue or a PR on GitHub!

Tags

Anantha Raju C

| Poetry | Music | Cinema | Books | Visual Art | Software Engineering |