Anatomy of a dbt Project

Every dbt project follows the same directory structure. Once you understand it, you can navigate any dbt project.

Directory Tree

db/dbt/
├── Dockerfile              ← Docker packaging
├── dbt_project.yml         ← Master config
├── profiles.yml            ← Database connection
└── models/
    ├── staging/            ← Thin rename/cast layer
    │   ├── _staging.yml    ← Source declarations + tests
    │   ├── stg_orders.sql
    │   └── stg_customers.sql
    └── marts/              ← Star schema (the real work)
        ├── _marts.yml      ← Data dictionary + tests
        ├── dim_customer.sql  ← Dimension table
        ├── fact_sales.sql    ← Fact table
        └── vw_monthly.sql    ← Analytical view

What Each File Does

dbt_project.yml

Project name, profile reference, materialization defaults. The "root config" — every dbt project starts here.

profiles.yml

ClickHouse connection details: host, port, user, password, database. Tells dbt where to execute SQL.

staging/*.sql

Clean raw data — rename columns to consistent conventions, cast types, filter out soft-deleted rows. Thin wrappers over the raw CDC tables.

marts/*.sql

Build the star schema — dimension tables (who/what/where), fact tables (measurable events), and analytical views (pre-joined for dashboards).

_*.yml

YAML schema files containing: source declarations, column descriptions, and tests (not-null, unique, referential integrity, accepted values).

Execution Order

dbt automatically resolves dependencies via the {{ ref() }} function. When a model references another model, dbt knows to run the dependency first. No manual orchestration needed.

⚠ How ref() works: If fact_sales.sql contains {{ ref('dim_customer') }}, dbt guarantees dim_customer is built before fact_sales. You never need to specify execution order manually.

DAG (Dependency Graph)

Staging

Dimension

Fact

View

stg_customers

─▸

dim_customer

─▸

fact_sales

─▸

vw_sales_detail

stg_orders

──────────────────▸

fact_sales

─▸

vw_monthly_sales

dbt builds this graph automatically from the {{ ref() }} calls in your SQL files. Models with no dependencies run first (staging), then dimensions, then facts, then views.

Two Layers, Two Jobs

Layer	Purpose	Materialization	Example
Staging	Clean + rename raw data	`view` (zero storage)	stg_orders, stg_customers
Marts	Star schema for analytics	`table` (persisted)	dim_customer, fact_sales, vw_monthly

Staging views are cheap — they don't store data, just provide a clean interface over raw tables. Mart tables are persisted for fast queries.