This article explores the types of tests in dbt (data build tool), including generic tests, dbt_utils package tests, and custom tests. It provides an overview of each type of test, their purpose, and how to use them effectively to ensure data quality and integrity.
In dbt (data build tool), there are several types of tests that can be utilized to ensure data quality and integrity. These tests can be categorized into three main groups: Generic Tests, dbt_utils Package Tests, and Custom Tests.
These are the standard tests provided by dbt to validate common data quality issues:
# In your .yml file
version: 2
models:
- name: your_model_name
columns:
- name: column_name
tests:
- not_null
# In your .yml file
version: 2
models:
- name: your_model_name
columns:
- name: column_name
tests:
- unique
# In your .yml file
version: 2
models:
- name: child_table
columns:
- name: foreign_key_column
tests:
- relationships:
to: ref('parent_table')
field: primary_key_column
# In your .yml file
version: 2
models:
- name: your_model_name
columns:
- name: column_name
tests:
- accepted_values:
values: ['value1', 'value2', 'value3']
The dbt_utils package extends the functionality of dbt with additional tests:
# In your .yml file
version: 2
models:
- name: your_model_name
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: ['column1', 'column2']
# In your .yml file
version: 2
models:
- name: your_model_name
columns:
- name: column_name
tests:
- dbt_utils.not_constant
# In your .yml file
version: 2
models:
- name: your_model_name
columns:
- name: id_column
tests:
- dbt_utils.sequential_values
Custom tests allow you to write specific SQL queries to address unique data validation needs:
-- tests/no_large_transactions.sql
with large_transactions as (
select *
from {{ ref('your_model_name') }}
where transaction_amount > 10000
)
select count(*) as errors from large_transactions where errors > 0;
# In your .yml file
version: 2
models:
- name: your_model_name
tests:
- custom_test_name: no_large_transactions.sql
By leveraging these different types of tests, dbt users can ensure robust data quality checks throughout their data transformation processes. These tests help maintain data integrity and reliability, supporting accurate and consistent data analysis.