This page is out of date, please use our new website https://surgemail.com

IP Failover

Note: This mechanism is NOT recommended. We recommend using a smart router to do this or doing manual IP changes as this should be a very rare event. Using scripts as below is not particularly sensible and generally we would always do such changes manually rather than risk a trivial failover event occuring when the backup server is not known to be in sync / tested and properly configured. (as could easily happen during the months between setting this up and a problem occuring)

Incoming connections are routed to a floating IP number, which is not formally assigned to any particular NIC, dynamically configured by SurgeMail. This 'floating' IP is monitored by two actual servers, one of which is configured as the master, and the other as the slave. If the master server goes down, the slave takes over the floating IP. If the slave detects the presence of the master, it reliquishes control of the floating IP. The master will take over the floating IP if it is not already assigned to a server.

This ensures that at any given time the floating IP number is guaranteed to be reachable from the outside world. System Administrators are free to take down either machine for servicing, and SurgeMail will automatically re-assign the floating IP number as needed.

How it works

There are two physical machines involved each of which is running the actual SurgeMail server as well as a monitor daemon. The servers receive failover configuration instructions via the tellmail commands:

tellmail failover add ip-number
tellmail failover remove ip-number

These commands are typically send by the monitor daemon but can also be issued from the command line. These commands in turn tell SurgeMail to execute shell-scripts called failover_add.sh and failover_remove.sh respectively (on Win32 platforms, failover_add.bat and failover_remove.bat). These scripts must be placed in the directory SurgeMail was installed to on both the master and slave machines.

These scipts must be edited by the System Administrator as they require system-specific parameters to be set

Additionally, the monitor daemons must also be configured via a failover.conf file, which contains four lines setting various parameters, like this (for the master):

failtime 30
ismaster true
aliasip 10.0.0.100
otherip 10.0.0.2

or this (for the slave):

failtime 30
ismaster false
aliasip 10.0.0.100
otherip 10.0.0.1

These config files will set up a failover sharing the floating IP number 10.0.0.100 between a master server with IP number 10.0.0.1 and a slave server with IP number 10.0.0.2. The floating IP is only transferred after failtime seconds, to prevent the system overreacting to brief outages.

The Win32 batch files

On Win32 SurgeMail executes the failover_add.bat and failover_remove.bat batch files. They each contain a single line calling the ipalias.exe command. Additional commands such as echoing to a logfile can also be added. The ipalias.exe command works as follows:

ipalias -sold_ip -ialias_ip
Tells ipalias.exe to search all network interfaces for the first one bound to old_ip. It then attaches an alias ip alias_ip to that same interface. ipalias.exe is used this way in failover_add.bat. The alias_ip argument is passed into the batchfile by Surgemail.

ipalias -ralias_ip
Tells ipalias.exe to search all network interfaces for the first one with alias_ip attached to it. It then removes the alias from that interface. ipalias.exe is used this way in failover_remove.bat. The alias_ip argument is passed into the batchfile by SurgeMail.

Typically only the failover_add.bat file will need to be edited to provide the correct old_ip parameter. The ipalias.exe executable should be packaged with the batchfiles.

The Linux shell scripts

On Linux SurgeMail executes the failover_add.sh and failover_remove.sh batch files. They each contain a single line calling the ifconfig command. Additional commands such as echoing to a logfile can also be added. Details on how ifconfig works can be found on the appropriate manpage.

As ifconfig needs root access you must run surgemail as root for these scripts to work. To do this set the ownership of the home surgemail directory, e.g.

        chown root /usr/local/surgemail

ifconfig eth0:1 alias_ip
Tells ifconfig to bind the given alias_ip to the given ethernet adapter and unit number. The alias_ip argument is passed into the batchfile by SurgeMail. Both a driver name and a unit number must be supplied.

ifconfig eth0:1 down
Tells ifconfig to unbind the ethernet adapter/unit number. The driver name and unit number must be the same as those specified in failover_add.sh.

Both shell scripts will need to be edited to provide the correct parameters to ifconfig. A full path may also need to be specified.

Download Scripts + Documentation

Version 1.0
download Microsoft Windows (9x,ME,NT,2000,XP) + Linux (libc6)