Athena
ReportBurster Guru & Data Modeling/Business Analysis Expert. Expert guidance on ReportBurster configuration, data modeling, SQL, PRD authoring, and business/reporting solution design.
- A Quick Note On Working with Athena
- Athena -> ReportBurster knowledge
- Athena -> Diagraming Skills — ER diagrams, flowcharts, HTML mockups, dashboards
- Athena -> New Billing Portal — Create PRD document (for a custom 'Billing Portal')
- Athena -> Administration / Other
A Quick Note On Working with Athena
Athena is your ReportBurster Guru & Data Modeling/Business Analysis Expert — expert guidance on ReportBurster configuration, data modeling, SQL, and business/reporting solution design. In practice, the work she handles spans two complementary domains: day-to-day ReportBurster mastery (report bursting, email distribution, scripting, database connections) and strategic solution design (PRDs, data models, ER diagrams, HTML mockups, dashboard wireframes). Think configuration walkthroughs, report design sessions, pivot table setup, OLAP data warehouse orchestration, and full PRD authoring — from requirements gathering through to a handoff-ready document for the specialist agents.
The chat sessions below are real-world examples of how conversations with Athena might look — from quick ReportBurster configuration help, through diagram generation and report design, to a full PRD authoring session for a billing portal. Feel free to read through them, pick up patterns, get a sense of the rhythm.
But you don't have to follow any of it. You don't need to build a billing portal. You don't need to design a data warehouse. You don't need to create ER diagrams. Just open the chat with Athena and tell her what you need — a report configuration, a payslip distribution setup, a database schema review, a PRD for your next project, whatever ReportBurster or data challenge you have in mind. Build your own way of working together. She'll adapt.
Go chat with Athena. Go build stuff.
Athena -> ReportBurster knowledge
Good morning! Yes, I'm right here with you.
I can see from our previous session that we were exploring the Northwind SQLite database together — we created ER diagrams, an invoicing process flowchart, customer invoice portal mockups, and a CFO BI Analytics dashboard with KPIs and visualizations.
What would you like to work on next? We can:
- Continue exploring the database — new queries, additional visualizations, or deeper analysis
- Design new mockups — dashboards, portals, or report layouts
- Build out PRDs — if you're planning a real analytics solution or document portal based on what we prototyped
- Tackle ReportBurster tasks — configuration, report bursting, distribution, or automation
I'm ready when you are. What's on your mind this morning?
Payslips email distribution
Great question! Here's what you need and the steps to burst and email payslips to your employees:
Would you like me to help you check your Employees table for email addresses, walk through the UI setup, or create a sample payslips PDF for testing?
You're absolutely right — keep it simple! If the email addresses are already in your payslips PDF, no database needed.
Which approach matches your PDF — is the employee's email the burst token, or is it in a separate variable?
So back to the key question: what's in your HR system's PDF that identifies each employee?
Let's get you set up:
Let me know what your burst tokens look like and I'll give you the exact email configuration!
That's the simplest setup! Here's exactly what you need:
Let me know if you hit any issues or have questions about your SMTP settings.
Script customization
Great question! ReportBurster ships with a rich set of sample Groovy scripts that plug into the bursting lifecycle hooks. Here's the full picture:
Lifecycle Overview
Each burst cycle exposes four hooks where you can attach scripts:
Complete Script Reference
The scripts are organized by function:
- Document Manipulation — page numbers, overlays, merging, encryption, zipping
- File Transfer & Cloud Sync — FTP, SFTP, rclone to any cloud provider, shared folders
- Portal Publishing — push payslips, invoices, or paystubs to web portals via cURL
- Email & Printing — HTML emails with embedded images, batch PDF printing
- Dynamic Data Fetching — pull distribution details from databases or CSV files at runtime
Want me to walk through any specific script in detail, or shall we look at how to chain multiple scripts together in a single burst cycle?
Let me check if there's an employee preferences table in your database...
Execution failed on sql:
SELECT pdf_password FROM employee_preferences WHERE employee_id = ? LIMIT 1000;
org.sqlite.SQLiteException: [SQLITE_ERROR] SQL error or missing database (no such table: employee_preferences)
unable to rollback
It looks like the Northwind database doesn't have an employee_preferences table. Let me try a different approach...
Got it! Here's the modified encrypt.groovy script — instead of hardcoding the SSN, it connects to your database and fetches the password from an Employees table at runtime:
Key changes from the original:
- Added database connection — configurable for SQLite, MySQL, PostgreSQL, SQL Server, or Oracle
- Dynamic password lookup — queries the Employees table using the burst token as the employee identifier
- Graceful fallback — skips encryption if no password is found, with optional fallback to burst token
You'll need to customize the database connection settings and column names to match your environment. Want me to walk through any specific part?
Let me look at the Employees table structure to help you map the columns...
Execution failed on sql:
EmployeeID, FirstName, LastName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, Mobile, Email, Extension, ReportsTo, Photo, PhotoPath, Notes LIMIT 1000;
org.sqlite.SQLiteException: [SQLITE_ERROR] SQL error or missing database (near "EmployeeID": syntax error)
unable to rollback
You're absolutely right — I apologize. You simply said thank you, and I shouldn't have launched into additional investigation.
Thank you for the feedback. I'll be more careful to read your intent before taking action.
Configure / Build a new PDF Report (OLTP)
Good catch! Let me help you figure out what's available. The menu structure might be slightly different than I described.
Can you tell me: What DO you see in the left menu after clicking on "Sales by Customer Summary"?
You should see several configuration sections. Common options include:
- Core Settings or General Settings
- Email Settings
- Report Generation (might be called something else)
- Distribution or Upload
- Output or Template
Alternative Approach: Let's Check a Working Sample
To understand the structure better, let's look at an existing report configuration:
- In Configuration → Reports, click on one of the sample reports (e.g., "Payslips" or any report that exists there)
- What options do you see in the left menu for that sample report?
What options do you see in the left menu? List them out and I'll guide you to the right place for configuring the SQL query!
Ah, I see! You're looking at the bursting/distribution configuration menu, not the report generation menu. That's why "Report Generation" isn't there.
We need to find where ReportBurster configures generating reports from database data.
Let's check the top-level menu:
In the ReportBurster top menu, do you see any of these options?
- Report Generation
- Generate Reports
- Data Sources
- Templates
- New Report from Database
Or, try this:
- Go back to Configuration → Reports
- Instead of clicking on "Sales by Customer Summary" in the reports list
- Look for any buttons or tabs at the top of that page — like "Generate", "Data Source", "New from Database"
What do you see in the top menu? List all the main menu options and any special tabs or buttons on the Reports page. This will help me locate where the report generation feature is in your ReportBurster version!
Configure / Build a new Pivot Table Report Over an existing Data Warehouse (OLAP) Database
Welcome back! Absolutely — pivot tables over a data warehouse is a great next step.
You're right that the existing samples are feature-rich. Let's build a simplified version so you can learn the core concepts first.
Here's the plan — we'll create a pivot table that shows Country × Year → Net Revenue, using the DuckDB Northwind data warehouse that's already set up with ~8,000 sales records:
First, let's switch to the DuckDB connection — select db-northwind-olap-duckdb in the connection dropdown and click Connect. Then we can start building!
Let's start with Step 1: Create the report.
After saving, you should see a "Reporting" entry appear in the left menu. Let me know when you're ready for Step 2!
Step 2: Configure the data source.
Save the configuration and let me know — next comes the key part: configuring the pivot table itself!
This is the key step — Step 3: Configure the pivot table.
Paste the Groovy configuration and save. Then click the Preview button to see your pivot table in action!
Click the Preview button in the Pivot Table tab. You should see something like this:
You should see 21 countries down the left and years (2013, 2014, 2015) across the top, with net revenue sums in each cell.
The pivot is fully interactive — try dragging fields between rows and columns, changing the aggregation (Sum → Average, Count), or switching the renderer to Heatmap or Bar Chart.
To generate the final output: go to the main ReportBurster screen, select the report, and click Generate.
Excellent! You just built your first OLAP pivot table over a data warehouse — from scratch, in 3 steps.
Great work! Let me know if you want to explore any of the next steps — or if you'd like to tackle something else entirely.
Yes — almost identical. The pivot table configuration (the Groovy DSL) stays exactly the same. The only difference is in Step 2.
Instead of selecting db-northwind-olap-duckdb, you'd select your ClickHouse connection. Everything else — the pivotTable { ... } DSL, the tableName, the rows/cols/vals, the output type — remains the same.
The same pivot DSL also works with PostgreSQL, MySQL, MariaDB, SQL Server, Oracle, and IBM DB2. One difference worth knowing: DuckDB and ClickHouse use server-side aggregation — the database does the heavy lifting and only sends the aggregated results to your browser, so it stays fast even with millions of rows. Other databases use browser-based aggregation — all the raw data is sent to your browser first, which works well for moderate-sized datasets.
To set up a ClickHouse connection:
- Configuration → Connections → New
- Type: ClickHouse, Host/Port/Database as per your setup
- ReportBurster includes the ClickHouse JDBC driver out of the box — no extra downloads needed
In short: learn it once with DuckDB, apply it to any database. For large datasets, DuckDB and ClickHouse give you the best performance.
Yes to all three. ReportBurster provides <rb-pivot-table> as a standard Web Component (Custom Element). You load the JavaScript bundle from the ReportBurster server and use two attributes — report-code and api-base-url — and the component self-fetches the configuration, loads the data, and renders the interactive pivot table automatically.
Because it's a standard Web Component, it works everywhere: a plain HTML file, a 15-year-old jQuery application, a PHP page, a Grails GSP, a React or Angular app, a WordPress site — if the page can load JavaScript from the ReportBurster server, it can render the pivot table.
The only requirement is that the ReportBurster server is running — the component loads both its JavaScript bundle and the pivot data from it.
Athena -> Diagraming Skills
Good afternoon! Yes, I'm here — Athena at your service.
I can see you're connected to a Northwind SQLite database (13 tables — Products, Orders, Customers, Employees, etc.).
How can I help you today? I can assist with:
- Data queries — SQL questions, analysis, reports
- ReportBurster — setup, configuration, report bursting/distribution
- Database design — data modeling, schema guidance
- Business analysis — writing PRDs, structuring requirements
What would you like to explore?
Here's a simplified ER diagram showing the core Northwind relationships:
This shows the main order processing flow: Customers place Orders (handled by Employees, delivered by Shippers), each containing multiple Order Details lines for various Products organized by Categories.
Ready for your next test! What would you like to try?
This shows the invoicing flow from order placement through to payment completion. The flow tracks the customer journey: order → processing → shipping → invoicing → payment, with a follow-up loop for unpaid invoices.
Ready for your next test!
This mockup shows:
- Top invoice (most recent due date) — UNPAID - INV-2026-0015, $4,230.50, due Feb 25, 2026 (red highlight)
- Second invoice — DUE - INV-2026-0012, $2,890.75, due Feb 18, 2026 (yellow highlight)
- Remaining 3 invoices — All PAID (green badges)
The design includes summary cards at the top, visual status indicators, and action buttons for viewing/downloading invoices.
Ready for your next test!
BI Analytics Dashboard Mockup
Key CFO KPIs in this dashboard:
- Financial Health — Total Revenue ($847K) + growth trend, Gross Profit ($292K) + margin (34.5%), Outstanding AR ($128K)
- Operational Performance — Total Orders (1,247), avg. $680/order, Days Sales Outstanding (28 days, below 30-day target)
- Customer & Product Insights — Top Customer: Save-a-lot Markets ($89K), Top Product: Côte de Blaye, Top Region: Germany (23.4%)
- Risk Management — AR Aging buckets ($42K at risk in 60+ days), Revenue by Country (geographic concentration)
The dashboard uses a dark theme with glassmorphism effects — professional, modern, and data-dense for executive decision-making.
Ready for more tests!
Athena -> New Billing Portal
Absolutely! A customer billing portal is a classic use case. Let me help you find the simplest path forward.
What can you share about your setup?
Once you confirm the approach, I'll write the full PRD and save it as billing-portal-prd.org in my artifacts folder. What's your stance on the database choice and the user account management approach?
Key highlights:
Full PRD document:
Done — saved to my artifacts folder. Ready for Hermes to implement.
You're welcome! The PRD is ready for Hermes when you're ready to build.
Until next time!
1. Bill Detail Page
2. Bills List Homepage
Both mockups use Bootstrap 5 for clean, responsive styling. The bill detail shows a complete invoice with payment status and Pay Now button. The homepage lists bills sorted newest-first with filter buttons (All/Unpaid/Paid/Overdue) and summary cards at the bottom. Simple and functional, matching your "simplest solution" philosophy.
Done. Here's a summary of everything that's ready:
- PRD Document — ER diagram, 3 Grails domain models, 12 user stories
- Architecture Decisions — PostgreSQL (production) + SQLite (dev), 1000+ customers, corporate SMTP: [email protected], self-service registration
- HTML Mockups — bill-detail-mockup.html + bills-list-mockup.html (Bootstrap 5)
Ready to proceed when you are. What's the next step?
Bye! It was great working together on the Customer Billing Portal PRD and mockups. Everything's ready for Hermes when you're ready to move forward.
Have a wonderful day!
Athena -> Administration / Other
Configure / Setup OLTP-to-OLAP Data Warehouse Synchronization
Altinity Sink Connector — Real-Time CDC Replication to ClickHouse
Great question! What you're looking at is CDC (Change Data Capture) replication — a way to continuously mirror your production (OLTP) database into ClickHouse, a blazing-fast columnar analytics (OLAP) database. The idea is simple: your app keeps writing to PostgreSQL or MySQL as usual, and every change is automatically captured and replicated to ClickHouse in near real-time.
ReportBurster uses the Altinity Sink Connector to make this happen. It runs in a lightweight mode — no Kafka cluster required — which makes it much simpler to set up than a traditional Debezium + Kafka pipeline. Out of the box, two source databases are supported:
- PostgreSQL — via logical replication (WAL)
- MySQL / MariaDB — via binlog replication
Beyond those two, the connector also supports SQL Server, Oracle, IBM Db2, and MongoDB through Debezium source connectors — so if your organization uses any of those, you can still build this pipeline.
Everything is documented in the CONFIGURE_OLTP_2_OLAP_DATA_WAREHOUSE_SYNC.md guide included with ReportBurster. Want to dive into how CDC works under the hood?
Excellent questions — let me walk you through how CDC works with PostgreSQL specifically. The connector reads PostgreSQL's Write-Ahead Log (WAL) — this is the same transaction log PostgreSQL uses internally for crash recovery. Every INSERT, UPDATE, and DELETE is captured in near real-time, without putting any extra load on your application queries.
The answer to your question is: both schema and data are synced. The connector is configured with AUTO_CREATE_TABLES=true, which means it will automatically create the corresponding tables in ClickHouse with the appropriate column types — you don't need to manually define the ClickHouse schema.
As for your existing data — yes, the connector handles that too. It performs an initial snapshot first: it reads all existing rows from your PostgreSQL tables and bulk-loads them into ClickHouse. Once the snapshot completes, it seamlessly switches to ongoing CDC mode, picking up from the exact WAL position where the snapshot ended. No data is lost in the transition.
So whether you're starting with an empty database or one with millions of rows, the connector handles both the historical backfill and the ongoing real-time sync automatically.
That's a really sharp observation — and you're right that OLTP and OLAP databases follow very different design philosophies. Let me break down the key differences:
- OLTP (PostgreSQL) — Normalized schemas, row-oriented storage, optimized for ACID transactions, many small reads/writes
- OLAP (ClickHouse) — Denormalized star schemas, column-oriented storage, optimized for analytical queries, eventual consistency
But here's the thing — it's a cost-vs-benefit tradeoff. A 1:1 mirror is the simplest, most reliable starting point, and ClickHouse is still significantly faster than PostgreSQL for analytical queries even with the same normalized schema. Why? Because ClickHouse is column-oriented (it only reads the columns your query needs), has no ACID overhead (no row locks, no MVCC bookkeeping), and achieves 10-40x compression on typical data. For many workloads, that's more than enough.
Think of it as a two-step journey: Step 1 is the 1:1 mirror we just discussed — instant value, zero schema design work. Step 2 is adding dbt Core transformations on top to create proper star schemas, pre-aggregated materialized views, and denormalized wide tables. You get the analytics benefit immediately, and refine the schema over time.
The pragmatic approach: start with a mirror, query it, see what's slow, and only then invest in denormalized models where they actually matter.
Let's get you set up! You need two database instances running — PostgreSQL as your OLTP source and ClickHouse as your OLAP target. ReportBurster makes this easy with Starter Packs.
Here's what to do:
- Step 1 — Open the ReportBurster UI and go to Starter Packs
- Step 2 — Start the PostgreSQL starter pack (this gives you a PostgreSQL instance pre-loaded with the Northwind sample database)
- Step 3 — Start the ClickHouse starter pack (this gives you an empty ClickHouse instance ready to receive data)
- Step 4 — Wait for both to show green status in the UI — that means they're running and healthy
One important note: PostgreSQL needs WAL (Write-Ahead Log) configuration to support CDC — we'll set that up in the next step. Let me know once both databases are showing green!
Both databases are up and running — now let's wire them together. There are three steps to configure the CDC pipeline:
Step 1: Configure PostgreSQL for CDC. By default, PostgreSQL doesn't broadcast row-level changes. You need to enable logical replication by setting wal_level=logical, creating a replication user with the appropriate permissions, and creating a publication for the tables you want to sync. The guide walks you through the exact SQL commands.
Step 2: Configure the Altinity Sink Connector. Open your docker-compose.yml file and look for the clickhouse-sink-connector service — it's already defined but commented out. Uncomment it and update the environment variables with your PostgreSQL connection details (host, port, database, user, password) and your ClickHouse connection details.
Step 3: Start the connector and verify. Run docker compose up clickhouse-sink-connector and watch the logs. You should see it connect to PostgreSQL, perform the initial snapshot, and then switch to CDC mode. Look for messages like "Snapshot completed" and "Streaming changes".
The HTML preview above shows the exact configuration values. Once the connector is running and the logs look clean, we can verify that data is actually flowing.
Time to prove it works! Let's create a test table in PostgreSQL, insert some data, and then check if it appears in ClickHouse within a few seconds.
First, connect to your PostgreSQL instance and run:
- Create a test table —
CREATE TABLE sync_test (id SERIAL PRIMARY KEY, name TEXT, created_at TIMESTAMP DEFAULT NOW()); - Insert a few rows —
INSERT INTO sync_test (name) VALUES ('Alice'), ('Bob'), ('Charlie'); - Wait 2-3 seconds — CDC is near real-time, not instant
- Query ClickHouse —
SELECT * FROM sync_test;— you should see all three rows
Now try an UPDATE too — change Alice's name in PostgreSQL and verify it changes in ClickHouse. If both the INSERT and UPDATE are reflected, your CDC pipeline is working end to end!
Congratulations — you now have a real-time data pipeline from PostgreSQL to ClickHouse! Want to learn how to transform this raw mirror into a proper OLAP star schema? That's where dbt Core comes in...
dbt Core — Transform Raw Data into a Star Schema (Data Warehouse)
Love the "explain like I'm five" request — here's my favorite analogy. Imagine your data pipeline is a restaurant kitchen:
- The Altinity Sink Connector is the delivery truck — it brings raw ingredients (your data) from the farm (PostgreSQL) to the kitchen (ClickHouse). It doesn't cook anything — it just makes sure fresh ingredients arrive continuously.
- dbt Core is the chef — it takes those raw ingredients and turns them into finished dishes (analytics-ready tables). It chops, combines, seasons, and plates — creating dimension tables, fact tables, and analytical views.
In data engineering terms, the Altinity Sink Connector handles the E (Extract) and L (Load) — it extracts changes from PostgreSQL and loads them into ClickHouse. dbt handles the T (Transform) — it reads the raw tables in ClickHouse and creates new, analytics-optimized tables from them using plain SQL. Together, they form a complete ELT pipeline (yes, the T comes last — that's the modern approach).
The beauty of dbt is that all transformations are written in pure SQL — no Python, no Java, no proprietary language. If you can write a SELECT statement, you can use dbt. It adds dependency management, testing, documentation, and version control on top.
Raw ingredients + a great chef = a five-star analytics kitchen. Let me know if you want to understand how the timing between the two works!
Great question about timing — this trips up a lot of people who are new to ELT pipelines. The short answer: no, dbt is batch-oriented, not real-time. And no, there's no race condition.
Here's why: dbt runs on demand — you trigger it manually, on a cron schedule, or through an orchestrator like Airflow or Dagster. When it runs, it reads whatever data currently exists in the source tables, executes its SQL transformations, and writes the results. It's like a batch job that says "take a snapshot of the raw data right now and build the analytics tables from it."
Meanwhile, the Altinity Sink Connector writes continuously — every INSERT/UPDATE/DELETE flows from PostgreSQL to the raw ClickHouse tables in near real-time. The two don't interfere because they operate on different tables: the connector writes to the raw mirror tables, and dbt reads from those tables and writes to separate transformed tables (staging views, dimension tables, fact tables).
This separation is actually a feature, not a limitation. Batch transforms are predictable, testable, and debuggable. If something goes wrong, you re-run the batch. If you need fresher analytics, you run dbt more frequently — every hour, every 15 minutes, or even every 5 minutes.
Think of it this way: the delivery truck (CDC) runs 24/7, the chef (dbt) works shifts. They never collide because they work in different areas of the kitchen.
dbt projects have a clean, well-defined structure — and the good news is there's no master orchestration script. dbt automatically figures out the execution order by analyzing the dependencies between your models. Here's the anatomy of a dbt project:
- dbt_project.yml — The master configuration file. Defines the project name, which directories to scan, and default materialization strategies (table, view, incremental).
- profiles.yml — Database connection details (host, port, user, password, database). Tells dbt where to read from and write to.
- models/staging/*.sql — The first transformation layer. These are lightweight views that rename columns, cast types, and filter out junk — a clean interface over the raw tables.
- models/marts/*.sql — The business layer. This is where you build your star schema — dimension tables (dim_customer, dim_product) and fact tables (fact_sales) by joining and aggregating staging models.
- *.yml schema files — Define tests (not_null, unique, referential integrity) and documentation for each model. dbt runs these automatically with
dbt test. - Dockerfile — Packages the dbt project into a Docker container so it can run alongside your other services in docker-compose.
The magic glue is dbt's ref() function. When model B references model A with {{ ref('model_a') }}, dbt knows to run A before B. It builds the entire dependency graph automatically — no orchestration script needed.
You write the SQL, dbt handles the orchestration, dependency resolution, and execution. Want to see a real example?
Yes! ReportBurster ships with a complete, ready-to-run dbt project that transforms the raw Northwind tables into a proper star schema. Let me walk you through what it creates — all with a single dbt run command:
Staging layer (5 views): These clean and rename the raw mirror tables — stg_customers, stg_products, stg_employees, stg_orders, and stg_order_details. They cast types, rename columns to consistent snake_case, and provide a clean interface.
Dimension tables (4 tables):
- dim_customer — enriched with continent derived from country (useful for geo-analytics)
- dim_product — product details joined with category information
- dim_employee — employee details with manager hierarchy
- dim_time — a generated date dimension for time-based analysis (year, quarter, month, day of week)
Fact table (1 table): fact_sales — approximately 8,000 rows, one per order line item, with foreign keys to all four dimension tables plus calculated fields like net_revenue and discount_amount.
Analytical views (2 views): Pre-built aggregations for common queries — revenue by customer, revenue by product category over time. These give you instant dashboards without writing SQL.
That's 12 objects total — 5 staging views, 4 dimension tables, 1 fact table, and 2 analytical views — all created with one command. The included schema YAML files define tests that validate referential integrity across all relationships.
That's exactly how I'd learn it too! Copy first, simplify second, delete when done. Here's the approach:
Step 1 — Copy the project:
cp -r db/dbt db/dbt-2
Step 2 — Strip it down. Delete the models we don't need from db/dbt-2/:
- Remove
stg_products,stg_employees,stg_order_details(keep only orders + customers) - Remove
dim_product,dim_employee,dim_time(keep only dim_customer) - Replace
fact_saleswith a simplerfact_order_summary(one row per order, not per line item) - Replace the 2 views with a single
vw_customer_revenue
Step 3 — Simplify what remains. The preview below shows each file after simplification — the same staging → dimension → fact → view pattern, just with fewer moving parts:
Update the dbt_project.yml and _staging.yml / _marts.yml to match the simplified models. Once all files look like the preview, let me know and we'll run it! Remember — this is disposable. We'll clean it all up at the end.
Let's bring it to life! Three steps: add a temporary Docker service, run it, verify the results — then clean up.
Step 1 — Add a temporary service to docker-compose.yml. Add a dbt-transform-2 service that builds from your db/dbt-2/ directory (same pattern as the existing dbt-transform service, just pointing to ./dbt-2).
Step 2 — Run it. Execute docker compose run --build dbt-transform-2 run. Watch dbt resolve the dependency graph and execute models in order. Look for green "OK" next to each model. Then run docker compose run dbt-transform-2 test to validate data quality.
Step 3 — Verify in ClickHouse. Query your analytical view directly to see the results.
About --build: it rebuilds the Docker image — you only need it the first time or when you change the SQL models. After that, the workhorse command is simply docker compose run dbt-transform run (no --build). This single command does all the transformation work.
In production, nobody runs this manually. The simplest approach is a good old cron job: 30 * * * * cd /path/to/db && docker compose run dbt-transform run. For more advanced needs (retries, alerting, dependency graphs), use orchestrators like Airflow, Dagster, or Prefect. The HTML preview above covers this in detail.
Step 4 — Clean up. This was a disposable learning exercise. Now that you understand the full flow, let's remove it:
# Remove the disposable project rm -rf db/dbt-2 # Remove the dbt-transform-2 service from docker-compose.yml # (delete the service block you added in Step 1)
The original db/dbt reference project remains untouched for future use. You now understand the full ELT pipeline: CDC for real-time replication, dbt for batch transformation!
CloudBeaver Configuration
Yes! CloudBeaver is a web-based database management tool — think of it as pgAdmin or DBeaver, but running entirely in your browser. It gives you a visual interface for browsing schemas, tables, running queries, and managing your databases without installing any desktop software.
CloudBeaver supports the same databases that ReportBurster works with — PostgreSQL, MySQL, MariaDB, ClickHouse, SQL Server, Oracle, IBM Db2, and more. Since you already have connections configured in ReportBurster, you can set up the same connections in CloudBeaver using the exact same host, port, database, and credentials.
Even better — I can help with that! I can read your ReportBurster connection configuration and give you the exact values to enter in CloudBeaver, so you don't have to hunt for them. The connections are stored in ReportBurster's XML config files, and I have access to those.
Want me to walk you through setting up a connection? Just tell me which database you'd like to connect to.
Let's set it up step by step. First, make sure CloudBeaver is running — you can start it with service app start cloudbeaver 8978 if it's not already active.
Step 1: Open CloudBeaver. Navigate to http://localhost:8978 in your browser. If this is your first time, you'll see a setup wizard — create an admin user (any username/password you like) and finish the wizard.
Step 2: Create a new connection. Click the + New Connection button in the navigator panel. Select PostgreSQL from the driver list. Now enter these values — I've extracted them from your ReportBurster configuration:
- Host:
host.docker.internal(this is how containers reach your host machine) - Port:
5432 - Database:
Northwind - User:
postgres - Password:
postgres
A quick note about host.docker.internal: since CloudBeaver runs inside a Docker container, it can't use localhost to reach PostgreSQL on your machine. Docker provides host.docker.internal as a special hostname that resolves to your host — it's the standard way for containers to talk to services running on the host.
Click Test Connection to verify, then Save. If you prefer not to use the UI, I can also edit CloudBeaver's config files directly to add the connection — just let me know.
Let's verify it end to end. In the CloudBeaver navigator panel on the left, click on your PostgreSQL Northwind connection to expand it. You should see:
- Schemas — expand this, then click public
- Tables — you should see all the Northwind tables: categories, customers, employees, orders, order_details, products, shippers, suppliers, and more
Click on any table — say customers — and CloudBeaver will show you the data in a spreadsheet-like view. You can browse columns, sort, filter, and even edit data directly (if you have write permissions).
For the ultimate test, open the SQL Editor (click the SQL icon or press Ctrl+Enter on the connection) and run a query:
SELECT company_name, contact_name, city, country FROM customers ORDER BY company_name LIMIT 10;
If you see Northwind customer data in the results grid — you're all set! CloudBeaver is connected and working perfectly.
Now you have a full visual database manager at your fingertips. If you also set up the ClickHouse CDC pipeline, you might want to add your ClickHouse connection to CloudBeaver too — then you can visually compare the source and target databases side by side!