Surgemail Clustering Architectures

To achieve very high reliability, redundancy or scalability, surgemail can be configured using a combination of several clustering architectures. Each clustering architecture has its own advantages and tradeoffs which you must consider in relation to your business need. The clustering architectures include:

Live replicate (Mirrorring)

Using surgemail mirrorring you can setup two servers to be continually updated "live replicates of each other" allowing you to send mail in to either system and read mail back from either server. In this configuration there is no single point of failure and if there is a major hardware problem on either server, you can failover to the second server with no interruption of service. Also, if one system goes down for maintenance, it will auto-resynch when it comes back online.

Mirrorring is the simplest and most cost effective way of getting a system with high reliablility and high redundancy. This is particularly useful if your mail load "can easily" be handled on a single server.

This failover can be done a variety of ways, but the recommended ways is either to use router based failover or manually switched using an extra floating IP address that is allocated to the primary server.

  • Router based failover If you are using router based failover it is recommended that you configure the router such that all consecutive connections from a single IP address get connected to the same server. This does not matter for POP and SMTP but is important for IMAP (and Webmail) connections. Alternatively configure the router so that all connections go to one server and failover to the second server if the first server stops responding for a period.
  • Manual IP switching Some people prefer to have more control over the failover, or do not have routers capable of hardware failover. In this case point your DNS record to an extra floating IP address that you manually allocate to one or other of the mirrorred servers.
  • Other Note there are some alternative options for failover the use of which is discouraged, such as switching DNS records (long delays) or using scripts to switch floating IPs (not reliable).

Considerations:

For more information on configuring mirrorring see Mirrorring FAQ and Configuration examples help page.

Functionally split

Surgemail can be functionally split across several servers. The main reason to use this is if your mail load is too large for one server (eg 40000 user+) and / or you have a particularly heavy spam loading or webmail client loading.

You can pick and match what you want to support on each server, but typically you would setup say 2 front end systems for spam and virus filtering. A single mail system to handle storage of local mail including access to this using POP and IMAP. And one or more webmai systems which handle the webmail load and talk to the primary mail server when necessary using IMAP.

This is the most efficient way to implement a high reliabilty system with a high level of scalability. Dependant on your user needs this allows you to host up to 100,000 users on your primary mail server..

Other considerations:

  • A functionally split architecture can be combined with mirrorring. You would simply introduce one more system into the above architecture which is a mirror of the primary mail system. As per mirrorring this removes the single point of failure (with associated mail data loss) you would otherwise have if your main mail system were to fail.
  • Mail will continue to be accepted by the filter systems if there is a problem with your primary mail system.

For detailed configuration information see configuring functionally split cluster help page

Shared storage cluster

Surgemail can be configured in a more traditional shared storage cluster configuration using an NFS (or other) shared storage device for providing standard mail services.

In this configuration you have several servers all running surgemail handling all mail services storing users mail using the same central storage. The incoming connection load is shared between all servers using an appropriate technique. This is typically a hardware based load balancing router.

This configuration has the advantage that it is truely symmetric and you can easily add in one or more servers if required. However the shared storage cluster configuration has two significant disadvantages:

1) Less efficient - Several, normally in memory optimisations (in particular quota handling and file locking) needs to be done on disk, increasing the disk IO load.

2) Some of surgemail advanced features are not fully functional (eg surgeplus calendaring)

For detailed configuration information see configuring shared storage cluster help page.

Domain split (Proxy mode for huge systems)

Proxy mode allows a domain to be split across several physical servers. This systems allows both infinite scaling, and 3 layer security. Incoming POP/SMTP connections arrive at one of several front end 'proxy' servers (running SurgeMail in proxy mode) these servers then lookup the user in the networked user database (via LDAP or our own TCPAuth module) and along with the normal response an extra response code of 'tohost=backend.host.name' is returned, the proxy then redirects the user to the appropriate back end system.

So you might run 4 back end systems, each with 100,000 users, and 2 front end systems. To add more users you just add as many front end and back end servers as needed to cope with the load.

Each user is only on one of the back end systems, the only piece in the system that has to handle all the users is the user database, which is a relatively trivial task as the quantity of data per entry is so small. We recommend the use of NWAuth or LDAPAuth but any of the database back end authent modules would be suitable.

For detailed configuration information see configuring proxy mode cluster help page.

Three tier model (Proxy mode for increased security)

In the tree layer model, proxy mode is used to split the system in to three tiers (separated by a firewall), each with a different security level. The top layer of servers exposes webmail directly via http. The middle tier exposes POP / IMAP via proxy systems and the backend systems are not directly exposed to the network at all.

In particular some telcos require this structure to their mail system.

 

Combination & tradeoffs

As already noted a combination of the above can also be used. Typical examples that you might use:

  1. Functionally split cluster + backup of mail system using mirrorring
  2. Proxy mode split cluster + backup of each backend mail system using mirrorring

By themselves the clustering techniques compare as follows:

  Mirror Functionally split Shared Storage Domain split (proxy)
Provides processing redundancy Some Yes Yes No (but can be added by splitting to functionally split or shared storage clusters)
Provides data redundancy Yes No (but can be added using mirrorring) No (but can be added using mirrorring) No (but can be added using mirrorring)
Provides load sharing Some Yes Yes Yes
Provides for incremental upgrades Some Some Yes Some
Use of basic mail features        
    SMTP Yes Yes Yes Yes
    POP Yes Yes Yes Yes
    IMAP Yes Yes Yes Yes
    Webmail Yes Yes Yes Yes
Use of advanced features:        
    Surgeplus Filesharing Yes Yes Yes Yes
    Surgeplus Calendaring Yes Yes Some* Yes*
    Mailing Lists Yes Yes Yes* Yes*

* Some special conditions apply. eg functionality may need to be setup on only one particular system in the cluster or not all of the advanced functionality may be available.