Backup MX for Mailman, with Exim

This week a hard disk from one of my servers broke. It has been replaced with just 3 minutes of downtime (kudos to OVH), and right now the RAID array is being rebuilt.

Since this is the first time this happens to me, and the server is running a production mailman service, I decided to take my time and set-up a backup MX server. This is how I did it.

Remember that I am using a Gentoo distribution, this may be slightly different in other distro.

rsync

The main problem setting up a backup MX is dealing with Spam. If the backup server accepts a spam message while the main server is down, and then forwards the email to the main server when it is up again, but the main server rejects it because the recipient address does not exist, then what to do? If the backup server bounces to the sending address, and this has been forged, then we are backscattering (forwarding spam), and our server will soon be blacklisted.

The point is the backup server should refuse incoming email if the main server would. This means checking the recipient address. What I did is to set-up a cron script to rsync the lists from the main server to the backup server every n minutes.

In the main server I open rsync access for the backup server:

/etc/rsyncd.conf
[mailman]
comment = Mailman sync
path = /var/lib/mailman
list = no
uid = mailman
gid = mailman
hosts allow = IP-ADDRESS-OF-THE-BACKUP-SERVER
hosts deny = *

This is the script in the backup server:

/etc/bin/mm-sync.sh
#!/bin/sh

START=`date`

# data
rsync -avz --delete MAIN-SERVER::mailman/data /var/lib/mailman/ > /tmp/mm-sync.log
echo >> /tmp/mm-sync.log

# lists
rsync -avz --delete MAIN-SERVER::mailman/lists /var/lib/mailman/ >> /tmp/mm-sync.log
echo >> /tmp/mm-sync.log

# archives
rsync -avz --delete MAIN-SERVER::mailman/archives /var/lib/mailman/ >> /tmp/mm-sync.log
echo >> /tmp/mm-sync.log

END=`date`

echo $START >> /tmp/mm-sync.log
echo $END >> /tmp/mm-sync.log
mail -s 'Mailman sync' root < /tmp/mm-sync.log

Note that I am syncing three folders: data, lists and archives. Actually you only need to sync the lists folder to set-up a backup mx. I sync everything so the backup server can replace the main server completely, if things go really bad.

And this is the cron file:

/etc/cron.d/mm-sync
*/10 * * * * mailman /etc/bin/mm-sync.sh

Exim

The exim configuration looks a lot like the configuration of the main server (for instance, I set-up the same blacklist checking rules). See my original post on setting up a Mailman service with exim.

First I define the same variables and options as in the main server. Not of them are really needed, but I prefer to keep the configuration in both servers as close possible.

/etc/exim/exim.conf
# Mailman
MM_HOME=/usr/lib/mailman
MM_DATA=/var/lib/mailman
MM_UID=mailman
MM_GID=mailman
domainlist mm_domains=example.com : example2.com
MM_WRAP=MM_HOME/mail/mailman
MM_LISTCHK=MM_DATA/lists/${lc::$local_part}/config.pck

smtp_accept_queue_per_connection = 30

Now, these mailman domains (mm_domains) are not defined as local domains, like in the main server, but as domains we relay to:

domainlist local_domains = @
domainlist relay_to_domains = +mm_domains
hostlist   relay_from_hosts = 127.0.0.1 : ::::1

Now comes the router:

# Mailman
mailman_router:
  driver            = dnslookup
  domains           = +mm_domains
  require_files     = MM_LISTCHK
  local_part_suffix_optional
  local_part_suffix = -admin     : \
         -bounces   : -bounces+* : \
         -confirm   : -confirm+* : \
         -join      : -leave     : \
         -owner     : -request   : \
         -subscribe : -unsubscribe
  transport         = remote_smtp
no_more

This is very much like the router used in the main server. The differences are: use the dnslookup driver instead of accept ; use the remote_smtp transport ; and add the no_more option. In other words, configure the router for remote delivery instead of local delivery.

Last, we need to modify the dnslookup router to not consider the mailman domains, since we have already handled them in the mailman router:

 dnslookup:
   driver = dnslookup
   domains = ! +local_domains : ! +mm_domains
   transport = remote_smtp
   ignore_target_hosts = 0.0.0.0 : 127.0.0.0/8
   no_more

You should test the router for both valid and invalid recipient addresses:

# exim -bt foobar@example.com
  router = mailman_router, transport = remote_smtp
  host example.com [AAA.BBB.CCC.DDD] MX=10
# exim -bt foobarrr@example.com
foobarrr@example.com is undeliverable: Unknown user

The DNS

Now remember to add your backup MX server to the DNS configuration. This is what it looks like mine:

$ host example.com
example.com has address AAA.BBB.CCC.DDD
example.com mail is handled by 10 smtp.example.com.
example.com mail is handled by 20 mx.example.com.

Note that the lowest the number, the highest the priority. This is a little confusing at first.

Testing

Now, how do you test this? There is a wonderful utility for testing email servers, swaks. I did the tests from my workstation:

localhost ~ $ sudo emerge swaks

While doing the tests, open a console on every server and watch the exim log (tail -f /var/log/exim/exim_main.log). First test is to check the backup server forwards emails to the main server:

localhost ~ $ swaks --to foobar@example.com --from toto@gmail.com --server mx.example.com

Second test. Simulate a downtime of the main server. To do this I use iptables in the main server to block connections from the backup server to the SMTP port:

smtp.example.com ~ # iptables -A INPUT -s BACKUP-SERVER-IP -p tcp --destination-port 25 -j DROP
localhost ~ $ swaks --to foobar@example.com --from toto@gmail.com --server mx.example.com

Wait a few minutes, and then open access again:

smtp.example.com ~ # iptables -D INPUT -s BACKUP-SERVER-IP -p tcp --destination-port 25 -j DROP

That’s all folks! Something wasn’t clear? Please drop a comment.

Leave a comment