PostgreSQL - Backup and High Availability/Fail-Over with Replication Part 1

PostgreSQL capabilities and features provide system administrators with the ability to configure their database servers for the common requirements of

  • backups and
  • high availability/fail-over
  • replication

The number of ways to configure PostgreSQL servers to perform these functions can be confusing. Additionally the configuration needed for the solutions build on what has gone before making the distinction between the different backup and high availability hard to appreciate at first.

This posts consists of 2 parts which focuses on the built-in support for these features and excludes 3rd party add-ons  that provide enhanced solutions.  Part 1 will look at backup options and part 2 will loo at high-availability and fail over.

The configurations describe below are valid as of PostgreSQL 9.5 but most of the options are available in earlier versions of Postgres.

  • Backups - Part 1

    • Periodic Online backups with utilities such as pg_dump,

    • Continuous archiving/backup with log shipping with point-in-time restore

  • High Availability(HA)/High Availability with Replication - Part 2

    • Warm standby server for fail over, (log shipping)

    • Hot standby server with single master for writes and reads enable slaves, (log shipping)

    • Streaming replication (Warm/Hot Standby)

This article is not meant to be exhaustive in the tuning that can be done for the various solutions . Its aim is provide an overview of the different approaches and to enable readers to get the basic solution configured. (In all likely hood there won’t be much tuning necessary in most cases.)

Backups

We will first look at pg_dump and then at “continuous archiving” in PostgreSQL nomenclature or “log shipping” as referred to by other database vendors.

Backup with PG_DUMP

The simplest approach to back up is to use the pg_dump utility. Pg_dump allows consistent backups to be made of a PostgreSQL database while online with a slight performance hit while the backup is running.

The typical approach with pg_dump is to backup the database once a day, usually after business hours.. The backup can be restored later with the pg_restore command. To find out more about pg_dump refer to the man pages. What I wish to do here is highlight some significant points that may be of interest to system administrators:

  • pg_dump can allow for the backup and restoration of a single table,

  • pg_dump can be used to backup a single database rather than the entire cluster as with other solutions, (Note a PostgreSQL instance is referred to as a cluster. I.e it is a server instance with potentially many databases.)

  • pg_dump can use multiple threads during the backup and restore to decrease backup/restore time,

  • it is the safest way to migrate a database between different major PostgreSQL versions. The utility pg_upgrade is also available but it upgrades all the databases in the entire cluster and, since it copies the data files, there is no option to roll back. Both of these solutions pg_ugrade and pg_dump requires downtime for a consistent upgrade. The amount of time to restore with pg_dump can be significant and should play a role in which strategy you select for your database disaster recovery. There are ways to attempt upgrades without downtime involving triggers and complicated set up but reliability is not guaranteed.

Pros:

  • Reliable, easy to implements,

Cons:

  • Can loose transactions between backup periods,

  • Long recovery times

Backup with Continuous Archiving/Log Shipping

The next step up is to use continuous archiving or “log shipping”. This involves a two step process. This approach allows for much smaller intervals between recoverable backup points than pg_dump, can be done online and has less of a performance impact but restoring or recovering from a disaster is more complex and can only be done at a cluster, not a database level.

There are two steps required to set up continuous archiving:

  1. Set up log shipping so log files, called WAL segments in PostgreSQL are continuously copied over to the archive server,

  2. Getting a base backup of the PostgreSQL cluster

Log Shipping Step 1

To set up the log shipping functionality to archive Write-Ahead-Log (WAL) segments we edit the postgresql.conf file, usually located under:

  • /etc/postgresql/<version>/<cluster name>/postgresql.conf or

  • /var/lib/postgresql/<version>/<cluster name>/

You need to locate the commented out lines below, uncomment and edit them to reflect the setting below. You could also just add them to the bottom of the postgresql.conf file.

wal_level = archive

archive_mode = on

archive_command = 'test ! -f /var/lib/postgresql/arhcive/%f && cp %p /var/lib/postgresql/arhcive/%f'

Of course you should change the path in the archive_command above to suit your set up. Make sure the target directory exists and the user who PostgreSQL is running as has write permissions to that directory. If there are problems the postgresql log file will let you know what wrong.

Log Shipping Step 2

Step 2 requires the creation of a base backup. There are two ways to do this manually or with the pg_basebackup script. Most should opt for the pg_basebackup option but this does require an additional step of setting up replication which can cause confusion when you first try and set up log shipping which is different to replication. So in order not to confuse things we will look at doing the base backup manually and then, once the concepts are understood, we will look at the steps necessary to set up replication for pg_basebackup

Manual Base Backup

The basic approach is to do a file system level copy of the cluster you wish to set up for log shipping.(Remember cluster means PostgreSQL instance i,e the data directory.)

To enable us to do a consistent recovery it is necessary is to make an entry in the write-ahead-log (WAL) file to mark the point at which the copy began. This is to enable a consistent recovery from the archived log files at a later date, as during the copy, inserts updates, deletes etc. will be continuing.. The copy is performed while the database is online.

In a client enter the following SQL commands

  1. SELECT pg_start_backup('label', true);

  2. Perform a filesystem level backup e.g. tar cvf pg_base_backup.tar (With modern file systems like ZFS or BTRFS this can be taken as a snap shot and the snapshot backed up.),

  3. SELECT pg_stop_backup();

You now have a continuous archiving solution in place. The WAL file segments are being copied to the archive location when they are closed out and a new segment created. A new WAL segments is created when the WAL segment reaches 16Mb in size. This is configurable and new segments can be forcibly created with pg_switch_xlog().

The speed with which the WAL log fills up depends on your server profile. There is also a time out setting in postgresql.conf, archive_timeout, that when reached will force a WAL segment write.

Log shipping does mean that in the event of disaster you could loose the last set of transaction in the WAL file that was not yet shipped but this will be limited depending on your settings.

Base backup with pg_basebackup

The utility pg_basebackup allows the base backup to be taken with a single command but, as noted above requires a replication account to be set up. The replication account is used during the base backup and thereafter is not used (unless you set up streaming replication – but more on that later).

Replication User Set Up Steps

For replication to be allowed there are two required steps:

  1. Set up a PostgreSQL role (user) with replication permissions (role) with one of the entries below depending on your security set up.

    • "create role replicator with login replication" or

    • "create role replicator with login replication encrypted password 'pass'"

  1. Set up host based security to allow the user to connect. By editing pg_hba.conf and adding one of the following entries depending on your configuration.

    • local replication replicator peer

    • host replication replicator 127.0.0.1/32 md5

  1. Edit postgresql.conf to set max_wal_senders. (max_wal_senders=3) is good to start if you are not using replication for warm/hot standby.

Finally once replication has been enable for the appropriate user you can run pg_basebackup as follows (once again depending on your security set up)

  • “sudo -u replicator pg_basebackup -D hotstandby -R --xlog-method=stream host=server01"

  • “pg_basebackup -D hotstandby -w -R –xlog-method=stream host=server01 user=replicator"

You have now set up archive replication. You can check if it is working by seeing if WAL segments files are being copied to your archive directory. You can also run

“select * from pg_stat_archiver” in a postgres client to see the state of replication. The log files are also handy when troubleshooting issues relating to archiving.

Pros:

* Allows for point-in-time restores,

Cons:

* Possibility of losing some data

Restoring Backups

Restoring with pg_restore

To restore from a dg_dump backup simple use the pg_restore command. Options available can be found in the pg_restore man pages.

Restoring from continuous archive/log shipping

To restore from a WAL file archive you will need to set up a PostgreSQL cluster with the data directory set to your base backup or if you are using your master server you will need to replace the data directory of your current PostgreSQL cluster. ( You will need to have stopped postgres first.)

Once the base backup is in place, before starting PostgreSQL, you will need to create a recovery.conf file. The file should be located in the data directory and at a minimum should have the following entry:

restore_command = 'cp /var/lib/postgresql/arhcive/%f %p'

This is simple a copy from the archive directory you set up in the archive_command above.If you want to do a point-in-time recovery add the following to the recovery.conf file:

recovery_target_time= '2016-11-14 08:22:00 SAST

Wrap Up

I hope the above has set out the configuration options for backup are retore in Postges in a clear and concise manner. It looks much more intense that it really is once you set up and configure it for itself. In part 2 - PostgreSQL and high availability and fail over with replication we look at the step needed to configure Postgres for such a solution.