Anatomy of a dbt Project

Every dbt project follows the same directory structure. Once you understand it, you can navigate any dbt project.

Directory Tree

db/dbt/
├── Dockerfile              ← Docker packaging
├── dbt_project.yml         ← Master config
├── profiles.yml            ← Database connection
└── models/
    ├── staging/            ← Thin rename/cast layer
    │   ├── _staging.yml    ← Source declarations + tests
    │   ├── stg_orders.sql
    │   └── stg_customers.sql
    └── marts/              ← Star schema (the real work)
        ├── _marts.yml      ← Data dictionary + tests
        ├── dim_customer.sql  ← Dimension table
        ├── fact_sales.sql    ← Fact table
        └── vw_monthly.sql    ← Analytical view

What Each File Does

dbt_project.yml
Project name, profile reference, materialization defaults. The "root config" — every dbt project starts here.
profiles.yml
ClickHouse connection details: host, port, user, password, database. Tells dbt where to execute SQL.
staging/*.sql
Clean raw data — rename columns to consistent conventions, cast types, filter out soft-deleted rows. Thin wrappers over the raw CDC tables.
marts/*.sql
Build the star schema — dimension tables (who/what/where), fact tables (measurable events), and analytical views (pre-joined for dashboards).
_*.yml
YAML schema files containing: source declarations, column descriptions, and tests (not-null, unique, referential integrity, accepted values).

Execution Order

dbt automatically resolves dependencies via the {{ ref() }} function. When a model references another model, dbt knows to run the dependency first. No manual orchestration needed.

How ref() works: If fact_sales.sql contains {{ ref('dim_customer') }}, dbt guarantees dim_customer is built before fact_sales. You never need to specify execution order manually.

DAG (Dependency Graph)

Staging
Dimension
Fact
View
stg_customers
─▸
dim_customer
─▸
fact_sales
─▸
vw_sales_detail
stg_orders
──────────────────▸
fact_sales
─▸
vw_monthly_sales

dbt builds this graph automatically from the {{ ref() }} calls in your SQL files. Models with no dependencies run first (staging), then dimensions, then facts, then views.

Two Layers, Two Jobs

Layer Purpose Materialization Example
Staging Clean + rename raw data view (zero storage) stg_orders, stg_customers
Marts Star schema for analytics table (persisted) dim_customer, fact_sales, vw_monthly

Staging views are cheap — they don't store data, just provide a clean interface over raw tables. Mart tables are persisted for fast queries.