pg_easy_replicate Supports Schema Change Tracking During Logical Replication

I have been meaning to support common DDLs (Data Definition Language) for pg_easy_replicate for quite some time now and I am super stoked that it is now finally out. This new capability addresses one of the limitations of PostgreSQL’s native logical replication, bringing more flexibility to database migrations and replication through pg_easy_replicate.

What is pg_easy_replicate?

For those new to the project, pg_easy_replicate is a CLI orchestrator tool that simplifies the process of setting up and managing logical replication between PostgreSQL databases. It ensures zero data loss and minimal downtime, making it invaluable for tasks like major version upgrades, load testing with blue/green database setups, and more.

The DDL Tracking Challenge

One of the limitations of PostgreSQL’s native logical replication is its inability to replicate most common DDL changes, such as adding new columns (through ALTER) or creating indexes (through CREATE INDEX). This means that schema alterations made on the source database during replication aren’t automatically applied to the target database. In an active database environment with multiple engineers working on it, you may want to apply these changes during ongoing replication to keep both the source and target databases in sync, or perhaps after switchover - a pg_easy_replicate technique where, after changes are replicated, you can put the source database in read-only mode and move traffic to the target database.

Meet `ddl_command_end` and `sql_drop`

To overcome this limitation pg_easy_replicate now utilizes PostgreSQL’s event triggers to capture DDL events as they occur. Specifically, we’re tapping into three key event types (source):

ddl_command_end: Triggered after a DDL command is executed. This event fires at the end of any DDL operation, such as CREATE, ALTER, or DROP commands on database objects.
sql_drop: Triggered when a database object is dropped. This event provides more granular information about object removal, including cascaded drops.
table_rewrite: Triggered when a table is rewritten. This occurs during certain ALTER TABLE operations that require a full table rewrite, such as changing a column’s data type.

These triggers call a custom function that captures the details of each DDL event, including the type of operation, the affected object, and the full DDL command. This information is then stored in a dedicated audit table for later use.

The following DDL operations won’t be captured

Changes to roles and privileges (GRANT, REVOKE)
Database-level operations (CREATE DATABASE, DROP DATABASE)
Tablespace operations
Some system catalog modifications

These limitations are generally acceptable for most use cases, as users typically set up replication to replicating existing table data.

How It Works

When a DDL change occurs on the source database, the trigger function captures the details and stores them in the audit table. Users can then list these changes using the list_ddl_changes command:

$ pg_easy_replicate list_ddl_changes -g cluster-1
[
  {
    "id": 1,
    "group_name": "cluster-1",
    "event_type": "ddl_command_end",
    "object_type": "table",
    "object_identity": "public.users",
    "ddl_command": "ALTER TABLE public.users ADD COLUMN email VARCHAR(255)",
    "created_at": "2023-09-01 10:15:30 UTC"
  }
]

This command displays a JSON-formatted list of all captured DDL changes, allowing users to review the modifications before applying them to the target database.

When ready, users can apply these changes to the target database using the apply_ddl_change command:

$ pg_easy_replicate apply_ddl_change -g cluster-1 -i 1
DDL change with ID 1 applied successfully.

This command fetches the stored DDL commands and executes them on the target database, ensuring that the schema remains synchronized between the source and target.

Why This Matters

This new DDL tracking feature significantly enhances pg_easy_replicate’s capabilities. It allows for more dynamic and flexible management of database schemas during the replication process, addressing the common challenge of keeping schemas in sync when the source database undergoes structural changes during long-running replications or migrations.

What’s next

As always, I’d love to hear from the community on any feedback and real-world use cases where you find this useful.

I am also exploring more advanced concepts in PostgreSQL’s logical replication with my new side project pg_flo (UPDATE: https://github.com/shayonj/pg_flo), which you can glimpse here. pg_flo uses similar techniques for tracking DDLs and much more, replicating those changes in near real-time. If you’d like to learn more or jam on PostgreSQL things feel free to drop a note [email protected] or a DM on Twitter/X.

Happy replicating!