4.1. Node management common to both UDR and BDR

When a new BDR node is joined to an existing BDR group, or when a UDR node is subscribed to an upstream peer, the system must copy the existing data from the peer node(s) to the local node before replication can begin. This copy has to be carefully co-ordinated so that the local and remote data starts out identical, so it's not sufficient to just use pg_dump yourself. The extension provides built-in facilities for making this initial copy.

There are two ways to join a new BDR node or create/subscribe a UDR node: logical or physical copy. After the initial copy is done there is no significant difference between physical or logical initialization of a BDR node, so the choice is down to which setup method will be quickest and easiest for your particular needs.

In a logical copy, a blank database in an existing standalone PostgreSQL instance is enabled for BDR or UDR via SQL functions calls. The BDR extension makes a connection to an upstream node designated by the user and takes a schema and data dump of that node. The dump is then applied to the local blank database before replication begins. Only the specified database is copied. With a logical copy you don't have to create new init scripts, run separate instances on separate ports, etc, as everything happens in your existing PostgreSQL instance.

In a physical copy, the bdr_init_copy is used to clone a user-designated upstream node. This clone is then reconfigured and started up as a new node before replication begins. All databases on the remote node are copied, though only the specified database is initially activated for BDR or UDR. (Support for multiple database join/subscribe may be added at a later date). After a physical node join or subscribe the admin will generally need to separately register the new PostgreSQL instance with the operating system to auto-start, as PostgreSQL does not do this automatically. You may also need to select a different PostgreSQL port if there is already a local PostgreSQL instance.

The advantages and disadvantages of each approach roughly mirror those of a logical backup using pg_dump and pg_restore vs a physical copy using pg_basebackup. See the PostgreSQL documentation on backup and restore for more information.

In general it's more convenient to use logical join when you have an existing PostgreSQL instance, a reasonably small database, and other databases you might not also want to copy/replicate. Physical join is more appropriate for big databases that are the only database in a given PostgreSQL install.

For the details, see Node management for UDR or Node management for BDR as appropriate.