How CDC (Change Data Capture) Works

CDC is a technique that captures row-level changes (INSERT, UPDATE, DELETE) from a database's transaction log and streams them to a target system — in this case, ClickHouse.

The Transaction Log

Every relational database maintains a transaction log — a sequential record of every write operation. This log exists for crash recovery, but CDC repurposes it as a change stream:

Database Transaction Log CDC Mechanism
PostgreSQL WAL (Write-Ahead Log) Logical replication slots
MySQL binlog (Binary Log) Row-based binlog reading
SQL Server Transaction Log CDC tables / CT tracking
Non-intrusive: CDC reads the transaction log that the database already writes. It does not add triggers, polling queries, or any load to your production database.

Data Flow

1 Source DB writes to transaction log — this is normal database operation. Every INSERT, UPDATE, and DELETE is recorded in the WAL (PostgreSQL) or binlog (MySQL).
2 Debezium reads the transaction log — the embedded Debezium engine connects to the database's replication protocol and reads change events. This is non-intrusive and does not affect DB performance.
3 Altinity Connector receives change events — each event contains the table name, operation type (c/u/d), the before/after row values, and a transaction timestamp.
4 Connector writes to ClickHouse — INSERT, UPDATE, and DELETE operations are mirrored into ClickHouse tables using ReplacingMergeTree engine for correct update/delete semantics.

Initial Snapshot vs. Ongoing CDC

First Start: Full Snapshot

When the connector starts for the first time, it performs a full snapshot of all existing data in the configured tables. Every row is read and loaded into ClickHouse. This ensures your OLAP copy starts with a complete dataset.

After Snapshot: Streaming Mode

Once the snapshot completes, the connector switches to streaming mode. From this point on, only new changes (inserts, updates, deletes) are captured and applied. This is extremely efficient — only the delta is transferred.

Resume After Restart

If the connector is stopped and restarted, it picks up where it left off. The last processed position in the transaction log is stored, so no data is lost and no duplicates are created.

Bottom line: You start the connector once. It loads everything, then keeps it in sync automatically. Stop it, restart it, it just works.

What Gets Synced

Aspect Synced? Details
Schema YES AUTO_CREATE_TABLES=true auto-creates ClickHouse tables matching source schema
Data (initial) YES Full snapshot of all existing rows on first start
Data (ongoing) YES All INSERT, UPDATE, DELETE operations streamed in real-time
DDL changes PARTIAL New columns are detected automatically. Table drops and renames require manual handling in ClickHouse.