Achieving High Availability with Docker Swarm, PostgreSQL and Django

The following methods working together deliver a High Availability solution for Django by maintaining state with PostgreSQL operating under Docker Swarm.
Postgresql
PostgreSQL is a widely used relational database that supports ACID transactions. The acronym ACID stands for atomicity, consistency, isolation, and durability. These are four key properties of database transactions that PostgreSQL supports to ensure the persistence and validity of data in the database.
One method PostgreSQL uses to maintain ACID properties is Write-Ahead Logging (WAL). PostgreSQL first records any transaction on the database to the WAL log files before it writes the changes to the database cluster’s data files.
Postgresql Continuous Archiving
With continuous archiving, the WAL files are copied to secondary storage, which has a couple of benefits. For example, a secondary database cluster can use the archived WAL file for replication purposes, but you can also use the files to perform point-in-time-recovery (PITR). That is, you can use the files to rollback a database cluster to a desirable point if an accident happens.
Postgresql Streaming Replication
Streaming replication is a popular method used to horizontally scale a relational databases. It uses two or more copies of the same database cluster running on separate machines. One database cluster is referred to as the primary and serves both read and write operations; the others, referred to as the replicas, serve only read operations. Streaming replication provides high availability of a system. If the primary database cluster or server were to unexpectedly fail, the replicas are able to continue serving read operations, or (one of the replicas) become the new primary cluster.
Docker Swarm Orchestration
Swarm constantly compares the desired network state against the current state and reconciles the two if necessary. If a primary database node fails, swarm reschedules its tasks into a different node in the same or, another datacenter. When the node recovers, it will become the replica database.
Django
Django doesn’t provide any solution for handling replication lag (i.e., query inconsistencies introduced because of the time taken for a write to propagate to the replicas).
For these reasons Django, will read and write to the Primary Server only.
High Availability Solution
Working together in this manner, as depicted below, the following occurs:
- Django writes to the Primary PostgreSQL database
- PostgreSQL continuously archives to the hosts Volume PGARCHIVE
- PostgreSQL continuously replicates to the hosts Volume PGDATA DC 2
- PostgreSQL continuously archives to the hosts Volume PGARCHIVE DC 2
In this manner, in normal network operation, there are four copies of the database in two different geographical locations at any one time.

Steady State
Should the database become unreachable, the Primary and Replica Databases are swapped and the method continues as before.

Failover
Implementation
The following postgres docker container entrypoint scripts are run when the container docks. These configure the PG Archiving, Streaming Replication, ensure that users exist, configure network security, and start the required processe at the right time. In this manner, the postgres docker image is extended to become either the primary or replication service. This then facilitates docker swarm to control the postgres cluster state.
Primary Script
#!/bin/bash if ! test -f "$PGDATA/postgresql.conf" then echo "⚙️ Initialising Database" su -c 'initdb -D /var/lib/postgresql/data/' postgres fi echo "▶️ Configuring this postgres to be the primary server" echo "⚙️ Setting network permissions for app" if ! grep -q '^host all all 10.0.0.0/8 trust' $PGDATA/pg_hba.conf then echo "host all all 10.0.0.0/8 trust" >> $PGDATA/pg_hba.conf fi echo "⚙️ Set network permissions for replicas" touch $PGDATA/pg_hba.conf chmod 600 $PGDATA/pg_hba.conf chown postgres:postgres $PGDATA/pg_hba.conf if ! grep -q '^host replication all 10.0.0.0/8 trust' $PGDATA/pg_hba.conf then echo "host replication all 10.0.0.0/8 trust" >> $PGDATA/pg_hba.conf fi echo "⚙️ Create/verify users and roles" psql --username postgres <<-EOSQL 2>/dev/null CREATE ROLE $PG_REP_USER WITH LOGIN NOSUPERUSER NOCREATEDB NOCREATEROLE NOINHERIT REPLICATION CONNECTION LIMIT -1 PASSWORD '$PG_REP_PASSWORD'; CREATE DATABASE $POSTGRES_DB; CREATE USER $POSTGRES_USER WITH PASSWORD '$POSTGRES_PASSWORD'; GRANT ALL PRIVILEGES ON DATABASE $POSTGRES_DB TO $POSTGRES_USER; EOSQL echo "⚙️ Enable archive mode" if grep -q '^archive_mode' $PGDATA/postgresql.conf then sed -i '/^archive_mode /s/=.*$/= on/' ${PGDATA}/postgresql.conf else echo "archive_mode = on" >> $PGDATA/postgresql.conf fi echo "⚙️ Set archive location" if grep -q '^archive_command' $PGDATA/postgresql.conf then sed -i '/archive_command/d' $PGDATA/postgresql.conf fi echo "archive_command = 'cp %p ${PGARCHIVE}%f'" >> $PGDATA/postgresql.conf echo "⚙️ Set max_wal_senders to 100" if grep -q '^max_wal_senders' $PGDATA/postgresql.conf then sed -i '/^max_wal_senders /s/=.*$/= 100/' ${PGDATA}/postgresql.conf else echo "max_wal_senders = 100" >> $PGDATA/postgresql.conf fi echo "⚙️ Set wal_level to replica" if grep -q '^wal_level' $PGDATA/postgresql.conf then sed -i '/^wal_level /s/=.*$/= replica/' ${PGDATA}/postgresql.conf else echo "wal_level = replica" >> $PGDATA/postgresql.conf fi echo "⚙️ Reloading..." psql --username postgres <<-EOSQL 2>/dev/null SELECT pg_reload_conf(); EOSQL echo "✅ This postgres is the primary server"
Replica Script
#!/bin/bas if ! test -f "$PGDATA/postgresql.conf" then echo "⚙️ Initialising Database" su -c 'initdb -D /var/lib/postgresql/data/' postgres fi echo "▶️ Configuring this postgres to be the replica server" echo "⚙️ Deleting existing database" rm -fr $PGDATA/* rm -fr $PGDATA/.* echo "⚙️ Wait for primary to be reachable" until ping -c 1 -W 1 $PG_PRIMARY_HOST; do echo "⏳ Waiting for primary to respond..." sleep 1s done echo "⚙️ Backup database from primary" echo "*:*:*:$PG_REP_USER:$PG_REP_PASSWORD" > ~/.pgpass chmod 0600 ~/.pgpass until pg_basebackup -h $PG_PRIMARY_HOST -D ${PGDATA} -U ${PG_REP_USER} --write-recovery-conf --progress --verbose; do echo "⏳ Waiting for primary to connect..." sleep 1s done echo "⚙️ Setting network permissions for primary" sed -i '/host replication/d' $PGDATA/pg_hba.conf echo "host replication all $PG_PRIMARY_HOST md5" >> $PGDATA/pg_hba.conf echo "⚙️ Enable archive mode" if grep -q '^archive_mode' $PGDATA/postgresql.conf then sed -i '/^archive_mode /s/=.*$/= always/' ${PGDATA}/postgresql.conf else echo "archive_mode = on" >> $PGDATA/postgresql.conf fi echo "⚙️ Set archive location" if grep -q '^archive_command' $PGDATA/postgresql.conf then sed -i '/archive_command/d' $PGDATA/postgresql.conf fi echo "archive_command = 'cp %p ${PGARCHIVE}%f'" >> $PGDATA/postgresql.conf echo "⚙️ Reloading..." psql --username postgres <<-EOSQL 2>/dev/null SELECT pg_reload_conf(); EOSQL echo "✅ This postgres is the replica server"