Question 1

In dbt, what is the main purpose of Jinja macros?

Accepted Answer

Jinja macros enable code reuse across multiple dbt models. They work like functions that accept parameters and return dynamically generated SQL code. This avoids code duplication and makes it easier to maintain complex transformations throughout the project.

Question 2

How to define a reusable dbt macro in a file?

Accepted Answer

A dbt macro is defined using the Jinja macro/endmacro syntax in a .sql file within the macros folder. The macro name is specified after the macro keyword, followed by parameters in parentheses. This macro can then be called from any model in the project.

Question 3

What is the difference between 'timestamp' and 'check' strategies for dbt snapshots?

Accepted Answer

The timestamp strategy compares an update date column (updated_at) to detect changes, which is more performant as it only compares one column. The check strategy compares values of specified columns (check_cols) to detect any change, useful when there is no reliable timestamp column available.

dbt - Advanced Features

In dbt, what is the main purpose of Jinja macros?

Answer

How to define a reusable dbt macro in a file?

Answer

What is the difference between 'timestamp' and 'check' strategies for dbt snapshots?

Answer

Which columns are automatically added by dbt when creating a snapshot?

How to configure an incremental model with the 'merge' strategy in dbt?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview