The Nemesida WAF API module is the key link between all Nemesida WAF components, so it is necessary to ensure its fault-tolerant operation. In this article we will explain how to do it.

In this article will be used two IP addresses: 10.0.0.1 (the primary server used by default) and 10.0.0.2 (replication server used when the primary is unavailable).

Configuring the primary server

The primary server is designed for full operation – getting data from all Nemesida WAF components and writing them to the database. Activation of replication functionality on the primary server is performed in a few steps:

1. Add an allow rule to iptables to access the PostgreSQL port (by default, 5432):

-A INPUT -p tcp --dport 5432 -j ACCEPT

2. Create user PostgreSQL, on behalf of which data will be replicated, for example, repluser:

# su postgres
$ createuser --replication -P repluser

After executing the command, set the password for the repluser user.

3. Make changes in the file /etc/postgresql/13/main/postgresql.conf:

...
listen_addresses = '*'
checkpoint_timeout = 30s
max_wal_size = 1GB
min_wal_size = 80MB
wal_level = hot_standby
max_wal_senders = 2 (number of replication clients +1)
max_replication_slots = 2 (number of replication clients +1)
hot_standby = on
hot_standby_feedback = on
...

4. Make changes in the file /etc/postgresql/13/main/pg_hba.conf:

host replication repluser 127.0.0.1/32 md5
host replication repluser 10.0.0.2/32 md5

5. Add slot:

# su postgres
$ psql -c "SELECT pg_create_physical_replication_slot('slave01');"

slave01 – the name of the replication server, which is specified during it’s configuration.

6. Restart PostgreSQL to apply the settings:

# service postgresql restart

Configuring the replication server

The replication server allows you to maintain minimal functioning of the Nemesida WAF API module (read only mode) when the main Nemesida WAF API server is unavailable. In this mode, the Nemesida WAF API allows to request data from the database (Nemesida WAF settings, behavioral models, etc.), without being able to write to it (e.g. new attack records, changing Nemesida WAF parameters, etc.). The replication functionality is activated in a few steps:

1. Make changes to the configuration file /etc/postgresql/13/main/postgresql.conf:

...
data_directory = '/var/lib/postgresql/13/main'
hba_file = '/etc/postgresql/13/main/pg_hba.conf'
ident_file = '/etc/postgresql/13/main/pg_ident.conf'
cluster_name = '13/slave'
stats_temp_directory = '/var/run/postgresql/13-main.pg_stat_tmp'
port = 5433
max_connections = 1000
primary_conninfo = 'user=repluser password=<REPLUSER_PASSWORD> host=10.0.0.1 port=5432 sslmode=prefer sslcompression=0 krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'slave01'
hot_standby = on
...

2. Copy the database:

# su postgres
$ rm -rf /var/lib/postgresql/13/main/*
$ pg_basebackup -h 10.0.0.1 -U repluser -D /var/lib/postgresql/13/main --write-recovery-conf

3. Start the slave cluster after replication is complete:

# pg_ctlcluster 13 slave start

4. Check cluster status:

# pg_ctlcluster 13 slave status

5. Check the correctness of the replication settings by creating a test table on the main server:

# su postgres
$ psql -c "CREATE TABLE test_table (id INT, name TEXT);"
$ psql -c "INSERT INTO test_table (id, name) VALUES (1, 'test');"

If everything is configured correctly, then the created table test_table will be displayed in the secondary server database.

Configuring Nemesida WAF API

After configuring replication, it is necessary to configure Nemesida WAF API to work in «Read Only» mode. To do this you need:

1. Make changes to the configuration file /var/www/nw-api/settings.py:

RO_MODE = true

2. Restart the Nemesida WAF API service:

# service nwaf-api restart

Configuring Failover

After configuring replication and Nemesida WAF API on the replication server, you need to configure load balancing. To do this, configure the priority of request processing for each of the servers with Nemesida WAF API, for example, as follows:


Load balancing scheme

The request is processed in several stages:

1. The Nginx web server, with a dynamic module installed, calls a DNS server to get the list of IP addresses of Nginx web servers with load balancing.

2. The Round Robin DNS method selects one of the web servers to which the request will be sent. Each Nginx web server intended for load balancing is in an independent loop consisting of the web server itself, Nemesida WAF API module and PostgreSQL DBMS and configured, for example, as follows:

To provide fault tolerance, we create 2 independent loops, each with its own Nginx web server configured to distribute the load as follows, for example:

Configuration example
upstream nw-api.example.com {
    server 10.0.0.1 max_fails=3 fail_timeout=30s;
    server 10.0.0.2 backup;
}

server {
    ...
    location / {
        proxy_pass http://nw-api.example.com;
        }
    ...
}

With this configuration, no matter which Nginx receives a request, it will first be sent to the server with the IP address 10.0.0.1 (green line), where the Nemesida WAF API module is running in standard mode.

3. If the server 10.0.0.1 is unavailable, then the request will be sent to the server with the IP address 10.0.0.2 (orange line), where the Nemesida WAF API module is running in «Read Only» mode.

Fixing synchronization errors

In cases of database synchronization errors, you can run a re-synchronization. For this you need:

1. Stop PostgreSQL or cluster:

# service postgresql stop

or

# pg_ctlcluster 13 slave stop

2. Back up pg_hba.conf file:

# cp /etc/postgresql/13/main/pg_hba.conf /etc/postgresql/13/main/pg_hba.conf.bak

3. Delete out of sync databases:

# rm -rf /var/lib/postgresql/13/main/*

4. Start re-sync:

# su postgres
$ pg_basebackup -h 10.0.0.1 -U repluser -D /var/lib/postgresql/13/main --write-recovery-conf

5. Make sure the new pg_hba.conf file is identical to the previous one:

# diff /etc/postgresql/13/main/pg_hba.conf /etc/postgresql/13/main/pg_hba.conf.bak

6. Start PostgreSQL service or cluster:

# service postgresql start

or

# pg_ctlcluster 13 slave start

7. Check cluster status:

# pg_ctlcluster 13 slave status