Question 1

What is dbt (data build tool)?

Accepted Answer

dbt is a data transformation tool that allows writing transformations in SQL and executing them in a data warehouse. It applies software engineering principles (versioning, tests, documentation) to data transformation work. dbt does not handle extraction or loading (the E and L of ELT), only transformation.

Question 2

What is the basic structure of a dbt project?

Accepted Answer

A dbt project contains a dbt_project.yml file at the root that defines the project configuration. The main folders are models (containing SQL files), tests for custom tests, macros for Jinja macros, seeds for CSV files, and snapshots for historical data captures. The profiles.yml file (usually outside the project) defines warehouse connections.

Question 3

What is the role of the profiles.yml file in dbt?

Accepted Answer

The profiles.yml file contains connection information to data warehouses (BigQuery, Snowflake, Redshift, PostgreSQL, etc.). It is usually stored in the ~/.dbt/ folder and not in the project to avoid versioning sensitive credentials. Each profile can have multiple targets (dev, prod) allowing easy switching between environments.

dbt - Fundamentals

What is dbt (data build tool)?

Answer

What is the basic structure of a dbt project?

Answer

What is the role of the profiles.yml file in dbt?

Answer

What is a model in dbt?

What is the role of the ref() function in dbt?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview