In part 1 of this post on we looked at setting up PostgreSQL for backup and restore, including point-in-time restores. This post will look at the high availability options and how to use streaming replication with fail over.
Postgres High Availability
High availability is another often required feature. This differs from backups in that the slave servers are a copy of the master server. There can be no point-in-time recovery performed on the slave servers and there is no restore time to bring the slave server up-to-date with the master.
It should be noted we are only talking about the built-in high availability options. There are several plug-ins and add ons available that attempt to address high availability in a multi-master set up or which allow per database high fail over and other features. From a built-in perspective there are two types different high availability configurations:
warm standby - slave server receives updates from master but is otherwise inaccessible. It can be promoted to master when the master server fails.
hot standby - slave server receives updates from master and can service read requests only. Any writes must be made to the master server
There are two ways to get the transactions from the master to the slave and hence there are 4 possible ways to set up high availability servers. The two ways to ship the transaction are via log shipping or replication. We will first cover setting up high availability using log shipping and then convert it to using streaming replication.
The terminology used by PostgreSQL is “continuous recovery” and “streaming replication” to describe the two options for replaying transactions.
Warm Standby – Set up with Continuous Archiving and Continuous Recovery
We can leave the master set up as per the log shipping configuration above. What we need to configure now is the slave PostgreSQL instance. This will require us getting a base backup and setting up a PostgreSQL cluster to use it as its data directory.
We will also have to set up our recovery.conf file and simple add the line:
standby_mode = 'on'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
This will put the slave into continuous recover mode and it will automatically apply WAL segment files as they arrive in the archive folder meaning that the slave will be up-to-date and there will be no need to replay WAL files to perform a recovery.
Hot Standby – Set up with Continuous Archiving and Continuous Recovery
To convert the warm standby to a hot stand by which can receive queries you simple have to make one change to postgresql.conf on the master and anther on the slave.
On the master change the wal_level setting from archive to host_standby,
On the slave you will also need to set hot_standby option on
There are two ways to promote a slave to a master in the event of a failure on the master.
The manual way is to run pg_ctl promote on the salve,
Via a trigger file. In recovery.conf you specify a trigger_file to watch for with the entry
When this file is detected the slave will be automatically promoted to the master. The file can be written by a monitoring script when it detects the master is down.
The final step to the pinnacle of high availability is to set up streaming replication so that we are not behind by an outstanding wall segment. To set up streaming replication we:
first set up a replication user and configure security settings to allow streaming replication (see part 1 of Postgresql Backup/Restore and high availability with Postgres),
secondly we need to configure the recovery.conf on the standby server to make use of streaming replication. To do this we need to add the setting primary_conninfo.
finally we update the master postgresql.conf file
In a situation where we have stopped the master server. Done a file system backup and then started the slave the recovery.conf file, as a minimum would look as below:
- standby_mode = 'on'
- primary_conninfo = 'host=192.168.0.1 port=5432 user=replicator password=pass''
Updating the master requires making the following chances to postgresql.conf:
- wal_level = hot_standby
- max_wal_senders = 3
- listen_addresses = '*'
A final step is to set up a replication slot. This is to allow PostgreSQL to monitor the state of the replication and store any WAL entries which it has been unable to send to the slave. It can keep track of the outstanding entries and ensure they are not recycled nor removed before being synced. Once again this is a two step process:
On the master run the SQL command
SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
On the slave edit recovery.conf and add the line
primary_slot_name = 'node_a_slot'
We can combine the above with a traditional archive shipping set up to allow for point-in-time recovery as long as we keep a copy fo the base backup.
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.1 port=5432 user=replicator password=pass''
primary_slot_name = 'node_a_slot'
It is quite possible to run streaming replication and continuous archiving at the same time. This will allow for high availability and for point-in-time restores if that becomes necessary so you may have both archive command and a replication user configured on the master.
Of crucial importance in these set up is making sure that you do not loose any WAL entries, either during base back up and configuration or during archiving/streaming. A missing WAL entry will make your backup/ha solutions unusable.
PostgreSQL allows for monitoring of the archiving and streaming with functions like:
Typically PostgreSQL will keep all WAL segments and entries until they can be shipped. In the case of a failed replication or archive this could lead to a build up of WAL segments and eventually your master server could run out of disk space so its a good idea to monitor the state of replication to avoid misfortune.