BDR Documentation | |||
---|---|---|---|
Prev | Up | Chapter 4. Node Management | Next |
BDR and UDR require different steps for setting up a node because BDR replication is all-to-all (mesh), whereas for UDR replication is unidirectional. Both modes share many of the same concepts as discussed below. The exact commands required differ and are documented below under Subscribing a UDR node and Joining or creating a BDR node.
When a new BDR node is joined to an existing BDR group, or when a UDR node is subscribed to an upstream peer, the system must copy the existing data from the peer node(s) to the local node before replication can begin. This copy has to be carefully co-ordinated so that the local and remote data starts out identical, so it's not sufficient to just use pg_dump yourself. The extension provides built-in facilities for making this initial copy.
There are two ways to join a new BDR node or create/subscribe a UDR node: logical or physical copy. After the initial copy is done there is no significant difference between physical or logical initialization of a BDR node, so the choice is down to which setup method will be quickest and easiest for your particular needs.
In a logical copy, a blank database in an existing standalone PostgreSQL instance is enabled for BDR or UDR via SQL functions calls. The BDR extension makes a connection to an upstream node designated by the user and takes a schema and data dump of that node. The dump is then applied to the local blank database before replication begins. Only the specified database is copied. With a logical copy you don't have to create new init scripts, run separate instances on separate ports, etc, as everything happens in your existing PostgreSQL instance.
In a physical copy, the bdr_init_copy is used to clone a user-designated upstream node. This clone is then reconfigured and started up as a new node before replication begins. All databases on the remote node are copied, though only the specified database is initially activated for BDR or UDR. (Support for multiple database join/subscribe may be added at a later date). After a physical node join or subscribe the admin will generally need to separately register the new PostgreSQL instance with the operating system to auto-start, as PostgreSQL does not do this automatically. You may also need to select a different PostgreSQL port if there is already a local PostgreSQL instance.
The advantages and disadvantages of each approach roughly mirror those of a logical backup using pg_dump and pg_restore vs a physical copy using pg_basebackup. See the PostgreSQL documentation on backup and restore for more information.
In general it's more convenient to use logical join when you have an existing PostgreSQL instance, a reasonably small database, and other databases you might not also want to copy/replicate. Physical join is more appropriate for big databases that are the only database in a given PostgreSQL install.
For the details, see Subscribing a UDR node or Joining or creating a BDR node as appropriate.
Note: Read Joining or subscribing a node before this section.
The SQL function Node management function examples is used to receive changes from the database specified in the function parameters into the current database. Subscribing to another node using this function will automatically copy the existing data in that the database subscribed to.
See also: Node management functions, bdr_init_copy.
Note: Read Joining or subscribing a node before this section.
For BDR every node has to have a connection to every other node. To make configuration easy, when a new node joins it automatically configures all existing nodes to connect to it. For this reason, every node, including the first BDR node created, must know the PostgreSQL connection string (sometimes referred to as a DSN) that other nodes can use to connect to it.
The SQL function bdr.bdr_group_create is used to create the first node of a BDR cluster from a standalone PostgreSQL database. Doing so makes BDR active on that database and allows other nodes to join the BDR cluster (which consists out of one node at that point). You must specify the connection string that other nodes will use to connect to this node at the time of creation.
Whether you plan on using logical or physical copy to join subsequent nodes, the first node must always be created using bdr.bdr_group_create.
Once the initial node is created every further node can join the BDR cluster using the bdr.bdr_group_join function or using bdr_init_copy.
Either way, when joining you must nominate a single node that is already a member of the BDR group as the join target. This node's contents are copied to become the initial state of the newly joined node. The new node will then synchronise with the other nodes to ensure it has the same contents as the others.
Generally you should pick whatever node is closest to the new node in network terms as the join target.
Which node you choose to copy only really matters if you are using non-default Replication Sets. See the replication sets documentation for more information on this.
See also: Node management functions, bdr_init_copy.