CDC is a technique that captures row-level changes (INSERT, UPDATE, DELETE) from a database's transaction log and streams them to a target system — in this case, ClickHouse.
Every relational database maintains a transaction log — a sequential record of every write operation. This log exists for crash recovery, but CDC repurposes it as a change stream:
| Database | Transaction Log | CDC Mechanism |
|---|---|---|
| PostgreSQL | WAL (Write-Ahead Log) | Logical replication slots |
| MySQL | binlog (Binary Log) | Row-based binlog reading |
| SQL Server | Transaction Log | CDC tables / CT tracking |
ReplacingMergeTree engine for correct update/delete semantics.
When the connector starts for the first time, it performs a full snapshot of all existing data in the configured tables. Every row is read and loaded into ClickHouse. This ensures your OLAP copy starts with a complete dataset.
Once the snapshot completes, the connector switches to streaming mode. From this point on, only new changes (inserts, updates, deletes) are captured and applied. This is extremely efficient — only the delta is transferred.
If the connector is stopped and restarted, it picks up where it left off. The last processed position in the transaction log is stored, so no data is lost and no duplicates are created.
| Aspect | Synced? | Details |
|---|---|---|
| Schema | YES | AUTO_CREATE_TABLES=true auto-creates ClickHouse tables matching source schema |
| Data (initial) | YES | Full snapshot of all existing rows on first start |
| Data (ongoing) | YES | All INSERT, UPDATE, DELETE operations streamed in real-time |
| DDL changes | PARTIAL | New columns are detected automatically. Table drops and renames require manual handling in ClickHouse. |