Setting up PostgreSQL Streaming Replication for High Availability and Backup
Keep your database running smoothly with PostgreSQL streaming replication.
Introduction
Setting up PostgreSQL streaming replication is an important step in ensuring high availability and backup for your data. PostgreSQL is a powerful open-source relational database management system that offers many features for high availability and backup.
There are a few steps involved in setting up streaming replication.
- Choose a master and standby server.
- Configure the master server.
- Configure the standby server.
- Start the replication.
In this article, we will cover how to set up PostgreSQL streaming replication synchronously for high availability and backup.
What is Streaming Replication?
Streaming replication is a method of replicating data between two or more servers. Standby servers can be kept current by reading a stream of write-ahead log (WAL) records from the Primary Server. This can be synchronous or asynchronous.
A standby server tracks changes made to a primary server, and can be promoted to a primary server if needed. A hot standby server can accept connections and serve read-only queries.
It is a popular method for ensuring high availability and backup because it allows for near-real-time replication of data. This means that if one server goes down, the data can be quickly and easily replicated to the other servers.
Getting Started
Requirements:
- Some familiarity with PostgreSQL and basic database concepts is helpful, but not required.
- At least two servers running preferably ubuntu, one as the Primary database cluster and another one as Standby.
Now that we have an understanding of the components required for PostgreSQL streaming replication, we will cover how to set it up.
Installing Postgres on Ubuntu
Ubuntu ships with a specific version of PostgreSQL that is then supported throughout the lifetime of that Ubuntu version. Other versions of PostgreSQL are available through the PostgreSQL apt repository.
If the version included in your version of Ubuntu is not the one you want, you can use the PostgreSQL Apt Repository. This repository will integrate with your normal systems and patch management, and provide automatic updates for all supported versions of PostgreSQL throughout the support lifetime of PostgreSQL.
This tutorial uses Postgres version 14. To use the apt repository, follow these steps:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install postgresql-14
The path can be exported or added to the .zshrc
or .bzshrc
file for automatic inclusion.
export PATH=$PATH:/usr/lib/postgresql/14/bin/
Configuring Streaming Replication
Synchronous streaming replication ensures that all data is replicated to the standby server before commit. Let’s configure the primary and standby nodes.
Primary Node
After you install Postgres, a new Postgres user is automatically created. You need to log in as the postgres user to access the database.
sudo -i -u postgres
Create a user called 'repuser' with the replication role, and set a password for the user.
--replication
- The new user will have the REPLICATION privilege.-P
or--pwprompt
- Will issue a prompt for the password of the new user.
createuser repuser --replication -P
Output:
Enter password for new role: # Enter any password
Enter it again:
You have successfully created a replication user. Now it's time to add some configurations by logging in as the root user.
exit
sudo -i
Open the Postgresql database configuration file using your favorite editor.
vim /etc/postgresql/14/main/postgresql.conf
Add the following configurations to your postgresql.conf
file:
listen_addresses = '192.168.64.23'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 32
synchronous_commit = remote_apply
synchronous_standby_names = '*'
listen_addresses
- Specifies the hostname or IP address(es) on which the PostgreSQL server should listen for connections from client applications.wal_level
- Wal level is the level of logging for the write-ahead log. This must be set to replica or higher to allow connections from standby servers.max_wal_sender
- This parameter specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. The default is 10. The value 0 means replication is disabled.wal_keel_size
- Specifies the minimum size of past log file segments kept in thepg_wal
directory, in case a standby server needs to fetch them for streaming replication.synchronous_commit
- Synchronous commit is a database transaction property that guarantees that once a transaction is committed, it will be visible to any subsequent transaction. It determines how much write-ahead logging (WAL) must be completed before the database server returns a “success” indication to the client. This is required if you want a synchronous replication!synchronous_standby_names
- Specifies a list of standby servers that can support synchronous replication.
For some additional parameters and for more information about the parameters mentioned above, consult the official documentation.
Now, open a new file called pg_hba.conf
stored in the database cluster's data directory. HBA stands for host-based authentication. This is the PostgreSQL client authentication configuration file. This file controls which hosts are allowed to connect to the server and how clients are authenticated.
vim /etc/postgresql/14/main/pg_hba.conf
Though a record can have several formats, here’s a simple format that we are going to use for our setup.
[record] [client_username] [client_database] [hostname/IPaddress] [client_authentication_method]
Add the authentication details for the standby server to allow the connection to the primary server. If you have more than one standby server, you can add them in the next line sequentially.
You can use this example as a template, but you will need to change the IP address of the standby server to match your own network configuration.
If you are using this database as a failover, the next line in the file allows connection from all clients with any IP, for any users, and any databases. If you are only using the server as a backup, you can skip this line.
host replication repuser 192.168.64.22/32 scram-sha-256
host all all 0.0.0.0/0 scram-sha-256
To learn more about the pbhba.conf file, refer to the official documentation.
Finally, restart the Postgres database by running the following command,
systemctl restart postgresql
Okay, next, let's set up the standby server.
Standby Node
Log in as the root user to avoid using sudo
.
sudo -i
Before continuing to the next step, stop the database by running the following command.
systemctl stop postgresql
We need to delete the initial default database cluster that was created when we installed the PostgreSQL database, which is located in the '/var/lib/postgresql/14/' directory. Make sure you are on the right server, or this will delete your existing database.
In this step, instead of deleting the default database cluster, I’m backing up the data directory by renaming it.
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_old
Log into the postgres
user to sync the replica database files from the primary node.
su - postgres
We will use the pg_basebackup
command to create a base backup of a PostgreSQL database cluster. This command will transfer the data from the primary database server using the PostgreSQL streaming replication protocol. This includes the WAL files, which are needed to reconstruct the database cluster. The backup will be stored in the ‘/var/lib/postgresql/14/main’ directory. This backup is taken without affecting other clients of the primary cluster database.
We will authenticate with the primary database cluster on the primary node using the 'repuser' and password.
pg_basebackup -R -h 192.168.64.23 -U repuser -D /var/lib/postgresql/14/main -P
Output:
Password: # Enter password for the 'repuser'
-R
or--write-recovery-conf
- Creates a standby.signal file and appends connection settings to the ‘postgresql.auto.conf’ file, easing the process of setting up a standby server.h
or--host=
- This specifies the hostname/IP of the server you want to connect to.-D directory
or--pgdata=
- Sets the target directory to write the output to, and it must be empty. This option is required.-U
or--username=
- Specifies the username to connect as.-P
or--progress
- This enables progress reporting.
For some additional parameters and for more information about the parameters mentioned above, take a look at the official documentation.
Exit the postgres user and return to the root user.
exit
It's time to edit the 'postgresql.conf' on the standby server!
vim /etc/postgresql/14/main/postgresql.conf
Update the following settings in your config file.
Don’t forget to change the following values,
listen_addresses
: change the IP address to your ownhost=
: change the IP address to your primary node.
listen_addresses = '192.168.64.22'
hot_standby = on
primary_conninfo = 'host=192.168.64.23 port=5432 user=repuser'
hot_standby
- This option means that the Postgres configuration allows for a standby server to be used in case of failure of the primary server.primary_conninfo
- This option specifies the connection string used to connect to the primary server in a streaming replication setup.
Start the Postgres database by running the following command.
systemctl start postgresql
Hurray! The streaming replication setup is now complete, and the data should begin streaming from the primary database cluster to the standby database cluster.
To check if everything is working, log in as the postgres
user on the ‘primary server’ and run the following command:
psql -c "select usename, application_name, client_addr, state, sync_priority, sync_state from pg_stat_replication;"
Output:
usename | application_name | client_addr | state | sync_priority | sync_state
---------+------------------+---------------+-----------+---------------+------------
repuser | 14/main | 192.168.64.22 | streaming | 1 | sync
(1 row)
Manual Failover
If the primary server/database fails, you will need to promote the standby server as the new primary server to accept read/write connections from the clients. This can be done by using the pg_ctlcluster
command as ‘sudo’ or ‘postgres’ user.
Syntax:
pg_ctlcluster [options] cluster-version cluster-name action
Example:
pg_ctlcluster 14 main promote
In the command above, "14" is the database cluster version, and "main" is the database cluster name.
[Optional] Automatic failover:
This article will not cover how to set up automatic failover, but a separate article will be written on how to set up a load balancer and automatically failover to the standby server in the event of primary server/database failure.
But if you're desperately looking to set up automatic failover, you can check out my GitHub for a quick and temporary solution.
Conclusion
In conclusion, setting up PostgreSQL streaming replication is an effective way to ensure high availability and backup for your data. By following the steps outlined in this article, you can easily set up your own replication system.
There are several methods of replicating data, but streaming replication is often seen as the best option for high availability and backup. This is because it offers the best combination of speed, flexibility, and security.
Additional Resources and References:
- https://www.postgresql.org/docs/current/high-availability.html
- https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
- https://www.postgresql.org/docs/current/runtime-config-replication.html
- https://wiki.postgresql.org/wiki/Streaming_Replication