
dbt - Fundamentals
dbt project, models, sources, refs, tests, documentation, materializations (table, view, incremental), seeds
1What is dbt (data build tool)?
What is dbt (data build tool)?
Answer
dbt is a data transformation tool that allows writing transformations in SQL and executing them in a data warehouse. It applies software engineering principles (versioning, tests, documentation) to data transformation work. dbt does not handle extraction or loading (the E and L of ELT), only transformation.
2What is the basic structure of a dbt project?
What is the basic structure of a dbt project?
Answer
A dbt project contains a dbt_project.yml file at the root that defines the project configuration. The main folders are models (containing SQL files), tests for custom tests, macros for Jinja macros, seeds for CSV files, and snapshots for historical data captures. The profiles.yml file (usually outside the project) defines warehouse connections.
3What is the role of the profiles.yml file in dbt?
What is the role of the profiles.yml file in dbt?
Answer
The profiles.yml file contains connection information to data warehouses (BigQuery, Snowflake, Redshift, PostgreSQL, etc.). It is usually stored in the ~/.dbt/ folder and not in the project to avoid versioning sensitive credentials. Each profile can have multiple targets (dev, prod) allowing easy switching between environments.
What is a model in dbt?
What is the role of the ref() function in dbt?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free