CDC (Real-Time) vs. dbt (Batch): No Conflict
A common question: "If CDC streams data continuously and dbt runs in batches, won't they interfere with each other?" The short answer: no.
Timeline
CDC
continuous stream — always running
dbt
periodic runs — every hour / 15 min
CDC and dbt operate on completely different timescales and never compete for the same resources.
Key Points
- CDC streams continuously — every INSERT, UPDATE, and DELETE arrives in ClickHouse within seconds
- dbt runs on-demand — triggered manually, by cron, or by an orchestrator (Airflow, Dagster, etc.) — not real-time
- No race condition — dbt reads whatever data exists at the moment it runs. New rows arriving during a dbt run do not cause conflicts because dbt operates on a point-in-time snapshot of the data
- Typical scheduling — hourly, every 15 minutes, or after known data loads complete
"This is a feature, not a bug — batch transforms are predictable, testable, and reproducible."
Side-by-Side Comparison
| Aspect |
CDC (Altinity) |
dbt Core |
| Mode |
Continuous stream |
Batch (on-demand) |
| Latency |
Seconds |
Minutes to hours |
| Trigger |
Automatic |
Manual / Cron / Orchestrator |
| Conflict risk |
N/A |
None — reads current snapshot |
What Happens During a dbt Run?
- dbt connects to ClickHouse and reads the current state of the raw tables
- It executes each model's SQL in dependency order (staging → dims → facts → views)
- While dbt is running, CDC continues streaming — new rows may land in raw tables
- Those new rows will be picked up on the next dbt run — no data is lost
⚠ Note: If you need sub-second freshness in your analytics tables, consider incremental models with short scheduling intervals. For most use cases, hourly or 15-minute runs provide an excellent balance of freshness and simplicity.