dbt for Data Analysts in 2026: Modeling, Testing and Interview Questions
Master dbt (data build tool) for data analytics — project structure, SQL modeling, testing strategies, and common interview questions with practical examples.

dbt (data build tool) has become the standard transformation layer in the modern data stack, and data analyst roles in 2026 increasingly require hands-on dbt skills. With dbt Core v2.0 in alpha and v1.12 stable, the framework now covers everything from SQL modeling to automated testing and documentation — all version-controlled in Git.
This guide covers dbt project structure, modeling patterns, testing strategies, and the interview questions most frequently asked in data analyst and data engineering interviews.
dbt handles the T (Transform) in ELT. Raw data lands in the warehouse via ingestion tools (Fivetran, Airbyte), then dbt transforms it into clean, tested, documented models ready for analysis. Unlike traditional ETL, transformations run directly in the warehouse using SQL.
dbt Project Structure: Staging, Intermediate and Marts
A well-organized dbt project follows a layered architecture. Each layer has a specific responsibility, and this separation prevents spaghetti SQL that becomes impossible to maintain.
The standard convention uses three layers:
- Staging — thin models that rename, cast, and clean raw source data. One staging model per source table.
- Intermediate — business logic that joins and aggregates staging models. These are internal building blocks.
- Marts — final models consumed by dashboards, reports, and analysts. Each mart represents a business entity or metric.
-- models/staging/stripe/stg_stripe__payments.sql
with source as (
select * from {{ source('stripe', 'payments') }}
),
renamed as (
select
id as payment_id,
amount / 100.0 as amount_usd, -- Stripe stores cents
status as payment_status,
created::timestamp as created_at,
customer_id
from source
where status != 'failed' -- Filter invalid records early
)
select * from renamed-- models/marts/finance/fct_monthly_revenue.sql
with payments as (
select * from {{ ref('stg_stripe__payments') }}
),
monthly as (
select
date_trunc('month', created_at) as revenue_month,
count(*) as total_transactions,
sum(amount_usd) as gross_revenue,
sum(case when payment_status = 'refunded' then amount_usd else 0 end) as refunds
from payments
group by 1
)
select
revenue_month,
total_transactions,
gross_revenue,
refunds,
gross_revenue - refunds as net_revenue -- Key business metric
from monthlyThe staging model handles type casting and renaming. The mart model contains the business logic. This separation means a change in the source schema only requires updating one staging model, not every downstream query.
Materializations: Choosing the Right Strategy
dbt supports four core materializations that control how models are persisted in the warehouse. Picking the wrong one leads to either slow queries or unnecessary compute costs.
| Materialization | Use Case | Rebuild Behavior |
|----------------|----------|-------------------|
| view | Lightweight staging models, low query frequency | Recreated as SQL view on each run |
| table | Mart models queried often by dashboards | Full table rebuild on each run |
| incremental | Large fact tables (events, logs) | Appends/merges new rows only |
| ephemeral | Reusable CTEs, never queried directly | Compiled inline as subquery |
Incremental models deserve special attention because they handle the most common performance bottleneck — processing billions of rows on every run.
-- models/marts/product/fct_page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id',
incremental_strategy='merge'
) }}
with events as (
select
event_id as page_view_id,
user_id,
page_url,
session_id,
event_timestamp
from {{ ref('stg_snowplow__events') }}
where event_type = 'page_view'
{% if is_incremental() %}
-- Only process new events since last run
and event_timestamp > (select max(event_timestamp) from {{ this }})
{% endif %}
)
select * from eventsThe is_incremental() macro checks whether the model already exists. On the first run, dbt processes all data. On subsequent runs, only new rows are processed — reducing a 2-hour job to minutes.
Testing Data Quality with dbt
Data testing in dbt works at two levels: schema tests defined in YAML and custom tests written in SQL. Both run with dbt test and fail the pipeline if violations are found.
Schema tests cover the most common data quality checks without writing SQL:
# models/staging/stripe/_stripe__models.yml
version: 2
models:
- name: stg_stripe__payments
description: "Cleaned payment records from Stripe"
columns:
- name: payment_id
description: "Unique payment identifier"
tests:
- unique
- not_null
- name: payment_status
tests:
- accepted_values:
values: ['succeeded', 'pending', 'refunded']
- name: amount_usd
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0" # No negative paymentsFor complex validation logic that YAML cannot express, custom singular tests fill the gap:
-- tests/assert_revenue_not_negative.sql
-- This test fails if any month has negative net revenue
select
revenue_month,
net_revenue
from {{ ref('fct_monthly_revenue') }}
where net_revenue < 0 -- Should never happenAny rows returned by a singular test indicate a failure. This pattern catches data corruption, upstream bugs, and logic errors before they reach dashboards.
Ready to ace your Data Analytics interviews?
Practice with our interactive simulators, flashcards, and technical tests.
Jinja and Macros: Reusable SQL Logic
dbt uses Jinja templating to make SQL dynamic. The two most important functions are ref() for model references and source() for raw data references. Beyond these, macros eliminate repetitive SQL patterns across the project.
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::numeric(12, 2)
{% endmacro %}-- Usage in any model
select
payment_id,
{{ cents_to_dollars('amount_cents') }} as amount_usd,
{{ cents_to_dollars('tax_cents') }} as tax_usd
from {{ source('stripe', 'payments') }}The macro compiles to standard SQL at build time. This approach guarantees consistent currency conversion logic across every model that handles monetary values — a single change in the macro propagates everywhere.
dbt Interview Questions for Data Analysts
Interviewers testing dbt knowledge focus on practical understanding of the tool rather than memorized definitions. The following questions appear consistently in data analytics interviews.
Q1: What is the difference between ref() and source()?
source() points to raw tables loaded by ingestion tools and defined in a sources.yml file. ref() points to other dbt models. Using ref() builds the DAG (directed acyclic graph) that dbt uses to determine execution order and track lineage. Hardcoding table names instead of using ref() breaks lineage tracking and prevents dbt from running models in the correct order.
Q2: When should an incremental model be used instead of a table?
Incremental models are appropriate when the source data is append-only or has a reliable timestamp column, and the table is large enough that full rebuilds are too slow or expensive. Event tables, log data, and time-series data are typical candidates. Small dimension tables should remain as table materializations because the complexity of incremental logic is not justified.
Q3: How does dbt handle slowly changing dimensions (SCD Type 2)?
dbt snapshots implement SCD Type 2 tracking. A snapshot monitors a source table and records changes over time by adding dbt_valid_from and dbt_valid_to columns. Two strategies exist: timestamp (uses an updated_at column) and check (compares column values directly). Snapshots run with dbt snapshot, separately from dbt run.
Q4: Explain the staging/intermediate/marts pattern.
Staging models clean and rename raw source data — one model per source table, no joins, no aggregations. Intermediate models contain business logic: joins, filters, and calculations that serve as building blocks. Mart models are the final outputs consumed by analysts and dashboards. This layering ensures each model has a single responsibility and changes to source schemas only impact staging models.
Q5: How do custom tests differ from schema tests?
Schema tests are declared in YAML and cover standard checks: unique, not_null, accepted_values, and relationships. Custom singular tests are SQL queries stored in the tests/ directory — any rows returned indicate failure. Custom generic tests are parameterized macros that can be reused across models like schema tests. Use schema tests for common constraints and singular tests for complex business rules like "revenue should never be negative."
dbt Best Practices for Production Projects
Production dbt projects that scale beyond a handful of models require consistent conventions. These practices prevent the technical debt that accumulates when multiple analysts contribute to the same project.
Naming conventions keep the project navigable. Staging models use the prefix stg_, intermediate models use int_, and fact/dimension tables use fct_ or dim_. Source-specific subdirectories (models/staging/stripe/, models/staging/salesforce/) group related models.
Documentation lives next to the code. Every model and column should have a description in the corresponding YAML file. Running dbt docs generate produces a searchable documentation site with a visual DAG — essential for onboarding new team members and auditing data lineage.
Source freshness checks catch upstream failures before they cascade:
# models/staging/stripe/_stripe__sources.yml
version: 2
sources:
- name: stripe
database: raw
schema: stripe
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _fivetran_synced
tables:
- name: payments
- name: customersRunning dbt source freshness checks whether the source data has been updated within the expected window. A 24-hour gap in payment data signals an ingestion failure that needs investigation before running transformations.
For deeper SQL skills applied in dbt models, the guides on SQL window functions and CTEs and advanced SQL for interviews cover the query patterns used most frequently in staging and mart models.
Start practicing!
Test your knowledge with our interview simulators and technical tests.
Conclusion
- dbt handles the transformation layer in ELT pipelines, running SQL directly in the data warehouse without requiring a separate compute engine
- The staging/intermediate/marts pattern separates source cleaning from business logic, making changes predictable and isolated
- Incremental materializations reduce processing time for large tables by only handling new or updated rows
- Schema tests (YAML) cover standard constraints; singular tests (SQL) validate complex business rules
- Jinja macros eliminate repeated SQL logic and ensure consistency across models
- Source freshness checks catch upstream ingestion failures before they corrupt downstream models
- Interview preparation should focus on practical scenarios — DAG lineage, materialization trade-offs, and testing strategies — rather than memorized definitions
Start practicing!
Test your knowledge with our interview simulators and technical tests.
Tags
Share
Related articles

Advanced SQL for Data Analyst Interviews: Subqueries, Pivots and Query Optimization
Master advanced SQL techniques tested in data analyst interviews — correlated subqueries, pivot queries with conditional aggregation, EXPLAIN plans, and indexing strategies with real-world examples.

SQL for Data Analysts: Window Functions, CTEs and Advanced Queries
Master SQL window functions (ROW_NUMBER, RANK, LAG/LEAD), Common Table Expressions, and advanced query techniques essential for data analyst interviews and daily work.

Top 25 Data Analytics Interview Questions in 2026
Master the most asked data analytics interview questions in 2026. Covers SQL, Python, Power BI, statistics and behavioral questions with detailed answers and code examples.