Introduction
Various replication modes are available with PostgreSQL.
In this article, a PostgreSQL 9.6 streaming replication is implemented on Linux Ubuntu 18.04 servers, it’s very easy. In the streaming replication mode, the standby connects to the primary, which streams WAL records (Write Ahead Log) to the standby as they’re generated, without waiting for the WAL file to be filled. Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping.
- The standby server can be in read only mode for reporting purposes for example.
- A replication slot ensures that the needed WAL files for the standby are not removed in the primary server before the standby server processes them. Multiple replication slots can be defined depending on the number of the standby servers.
(LSN : Log Sequence Number)
The procedure below is valid for PostgreSQL 9.6, 10 and 11. For PostgreSQL version 12 and above, the setup is slightly different and not discussed here. It is not important when configuring a PostgreSQL streaming replication, but default WAL files location is different on PostgreSQL 9.6 and PostgreSQL 10/11.
PostgreSQL 9.6 | $PGDATA/pg_xlog |
PostgreSQL 10/11 | $PGDATA/pg_wal |
The context is the following : 1 primary server, 1 standby server.
Primary server : VPSFRSQLPAC1
Standby server : VPSFRSQLPAC2
Binaries (PostgreSQL 9.6.15) : /opt/postgres/pgsql-9.6/bin
$PATH : /opt/postgres/pgsql-9.6/bin:$PATH
$PGLIB : /opt/postgres/pgsql-9.6/lib
Port : 30001
$PGDATA : /sqlpac/postgres/srvpg1
$CFG : /opt/postgres/dba/srvpg1/cfg
Configuration files :
$CFG/postgresql.conf
$CFG/pg_hba.conf
$CFG/pg_ident.conf
Controlling the PostgreSQL Server :
|
Preparing the primary server
System parameters
The primary server must be restarted, especially with PostgreSQL 9.6, to apply at least the following static parameters :
listen_addresses
:*
wal_level
: the wal level isreplica
for streaming replication.max_replication_slots
: at least 1 replication slot (1 standby). Higher values of replication slots if more standby or logical servers will be configured.max_wal_senders
: at least 3 wal senders (1 standby + 2 forpg_basebackup
). Higher values if more standby servers will be configured.
$CFG/postgresql.conf
listen_addresses = '*'
wal_level=replica
max_replication_slots=3
max_wal_senders=3
With PostgreSQL 10 and 11, the default values are already adjusted for replication. However, check the settings.
PostgreSQL 10 / 11 | Default values |
---|---|
wal_level
max_replication_slots
max_wal_senders | replica
10
10 |
postgres@vpsfrsqlpac1$ pg_ctl restart -D $CFG
Replication role
Create a role with the replication
privilege, this role will be used by the standby server to connect to the primary
server :
create role repmgr with replication login encrypted password '***********';
Add the role in the primary server file pg_hba.conf
with the standby IP address server, this will allow connections from the standby server.
Don’t forget to manage existing firewall rules.
$CFG/pg_hba.conf
host replication repmgr 51.xxx.xxx.xxx/32 md5
Here, SSL connections are not implemented.
Reload the configuration :
postgres@vpsfrsqlpac1$ pg_ctl reload -D $CFG
Replication slot
Create a replication slot in the primary server.
select * from pg_create_physical_replication_slot('standby1');
slot_name | xlog_position -----------+--------------- standby1 | (1 row)
select slot_name, restart_lsn from pg_replication_slots;
slot_name | restart_lsn -----------+------------- standby1 | (1 row)
The replication slot (restart_lsn
) will be initialized during the primary server backup
with pg_basebackup
.
Starting with PostgreSQL 11, it is not mandatory to create manually the replication slot, this one can be created and initialized with
pg_basebackup
.
Primary server backup (pg_basebackup)
The primary server backup is performed with pg_basebackup
.
postgres@vpsfrsqlpac1$ pg_basebackup -D /sqlpac/postgres/backup/srvpg1 \
-X stream \
--write-recovery-conf \
--slot=standby1 \
--dbname="host=localhost user=postgres port=30001"
Starting with,PostgreSQL 11, add the argument --create-slot
if the replication slot has not been previously created.
With the option --slot
giving the replication slot name : that way, it is guaranteed the primary server does not remove
any necessary WAL data in the time between the end of the base backup and the start of streaming replication.
When the backup is completed, the replication slot standby1
is then defined :
select slot_name, restart_lsn from pg_replication_slots;
slot_name | restart_lsn -----------+------------- standby1 | 0/33000000
The option --write-recovery-conf
(or -R
) writes a file recovery.conf
in the root backup directory.
This file will prevent any user error when starting the standby server, this file indeed indicates a standby server, and the slot name
is given :
/sqlpac/postgres/backup/srvpg1/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=postgres host=localhost port=30001 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name = 'standby1'
Standby server activation
Install the primary server backup previously performed in the standby data directory ($PGDATA
).
recovery.conf
Be sure the file recovery.conf
is installed in the standby server root data directory
with the option standby_mode = 'on'
and the replication slot name.
Update the connection info parameters to the primary server in this file.
$PGDATA : /sqlpac/postgres/srvpg1/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=repmgr host=vpsfrsqlpac1 port=30001 password=************'
primary_slot_name = 'standby1'
postgresql.conf
If read only connections are allowed, check the parameter hot_standby
is set to on
on the standby server
(on
by default starting with PostgreSQL 10):
$CFG/postgresql.conf
hot_standby = on
Starting the standby server
Now the standby server can be started.
postgres@vpsfrsqlpac2$ pg_ctl start -D $CFG
When there is no error, in the standby server log file :
LOG: entering standby mode
LOG: redo starts at 0/33000028
LOG: consistent recovery state reached at 0/34000000
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/34000000 on timeline 1
The standby server is in recovery mode :
postgres@vpsfrsqlpac2$ psql -p30001
select pg_is_in_recovery();
pg_is_in_recovery ------------------- t
If the replication slot has not been defined and the needed WAL files removed in the primary server, an error occurs :
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000010 has already been removed
Testing replication
Create a table heartbeat
in the primary server, this table will be updated every minute :
postgres@vpsfrsqlpac1$ psql -p30001
create table heartbeat ( reptimestamp timestamp ); insert into heartbeat values (now()); select * from heartbeat;
reptimestamp ---------------------------- 2019-11-22 09:04:36.399274
Check the replication to the standby server :
postgres@vpsfrsqlpac2$ psql -p30001
select * from heartbeat;
reptimestamp ---------------------------- 2019-11-22 09:04:36.399274
Pause / Resume replication
To pause/resume replication, on the standby server :
postgres@vpsfrsqlpac2$ psql -p30001
PostgreSQL 9.6 | PostgreSQL 10 / 11 | |
---|---|---|
Pause replication |
|
|
Resume replication |
|
|
Replication paused ? |
|
|
Essential replication informations
This article does not deal with replication monitoring, however below the essential informations about the replication state.
Standby server : pg_stat_wal_receiver
In the standby, use the view pg_stat_wal_receiver
:
postgres@vpsfrsqlpac2$ psql -p30001
\x on; select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------------------------- pid | 2262 status | streaming receive_start_lsn | 0/97000000 receive_start_tli | 1 received_lsn | 0/99000920 received_tli | 1 last_msg_send_time | 2019-11-22 18:17:46.355579+01 last_msg_receipt_time | 2019-11-22 18:17:46.355835+01 latest_end_lsn | 0/99000760 latest_end_time | 2019-11-22 18:15:46.232277+01 slot_name | standby1 conninfo | user=repmgr password=******** dbname=replication host=vpsfrsqlpac1 port=30001 …
The WAL receiver process id is 2262 :
postgres@vpsfrsqlpac2$ ps -ef | grep 'postgres' | grep 2262
postgres 2262 32104 0 18:35 ? 00:00:04 postgres: wal receiver process streaming 0/99000920
Primary server : pg_stat_replication and pg_replication_slots
In the primary server, use pg_stat_replication
and pg_replication_slots
:
postgres@vpsfrsqlpac1$ psql -p30001
\x on; select * from pg_stat_replication;
-[ RECORD 1 ]----+----------------------------- pid | 6247 usesysid | 16384 usename | repmgr application_name | walreceiver client_addr | 51.xxx.xxx.xxx client_hostname | client_port | 41354 backend_start | 2019-11-22 09:35:42.41099+01 backend_xmin | state | streaming sent_location | 0/99000920 write_location | 0/99000920 flush_location | 0/99000920 replay_location | 0/99000840 sync_priority | 0 sync_state | async
\x on; select * from pg_replication_slots;
-[ RECORD 1 ]-------+----------- slot_name | standby1 plugin | slot_type | physical datoid | database | active | t active_pid | 6247 xmin | catalog_xmin | restart_lsn | 0/99000920 confirmed_flush_lsn |
The WAL sender process id is 6247 :
postgres@vpsfrsqlpac1$ ps -ef | grep 'postgres' | grep 6247
postgres 6247 5576 0 18:35 ? 00:00:00 postgres: wal sender process repmgr 51.xxx.xxx.xxx(41354) streaming 0/99000920
Conclusion
Installing a streaming replication with PostgreSQL 9.6 is very easy, maybe one of the easiest replication architecture.
Do not forget replication slots ! Only replication slots guarantee standby servers won’t run out of sync from the primary server.