Clusterlike Non-Clusters for SMB Messaging

Regardless of size, no business can afford to be without e-mail access these days. Combine that with the more and more services we pile on these mail servers and you’ve got the making of a perfect SMB disaster. Mail servers are far more than just “mail”, they drive groupware functionality, faxing services, calendaring, mobile device security as well as remote document and file access. We have consolidated and built a civilization on top of a single box yet haven’t raised our budgets and expectations to drive what used to be spread over multiple servers, workstations and was the sole job of a few part time employees. The following article helps you take a look at your small business infrastructure and realize the potential in new cluster-like features in Microsoft Exchange 2007.

Let’s time travel for a bit. Back to 1997, when you likely first installed Microsoft Outlook and setup that shiny POP3 account. You’d pull your mail down to your workstation where it sat and got backed up every so often. Perhaps you were one of the few that stored the PST files on the local file server where it was backed up daily. Then 2000 came around and you started keeping your mail on the server. But along with that new server you discontinued your old Winfax server and started using SBS 2000 as your faxing server, or perhaps got that efax.com account. Then 2003 came, you installed SharePoint for easy document access. You went on the road and started using Outlook Web Access 2003 when out of the office. The company mail boy went to college, you let go of the interoffice memo runner, the receptionist got replaced by a brand new IVR (interactive voice-response) voicemail system that dropped all missed voicemail directly into your inbox. In 2005 your company got a Windows Mobile device. Right now you’re looking at getting even more productivity by integrating Live Communications Server in your company so you can do instant messaging and live meetings with the branch office managers to reduce the travel time. Does this sound like your company?

If it helps you come clean, that is what happened to us. And yes, over the years the computer running all of this has gotten a lot more expensive, a lot more memory, hard drive space, backup systems and CPU resources went into it. Every time something bad happened we’d add in another building block. More hard drives. More memory. More processors. More SAN. SAN? SAN in Small Business? Yes, we attached more storage to the system to provide multiple daily snapshots of the mail system. Why? Because even though we’ve put more and more into our server to keep pace with the growth when we had a power interruption or a bad patch that didn’t properly reboot the server the databases never dismounted properly. So while I sat there running isinteg and eseutil on my databases I had staff behind me asking why it’s taking so long, if we need to do something else, how we can make sure this doesn’t happen.

Relax… Relax. The answer is simple. We just need to invest $25,000 into a failover cluster solution and we’ll be fine.

Awkward silence. The answers were always there, we all knew what the right answer was but we just could not afford to pay for it. So when the Microsoft Exchange product group got us together at TechEd 2006 and explained the new LCS and CCS features I wrote one note “problem solved.”

LCR & CCR

Local Continuous Replication and Continuous Cluster Replication are two great new features of Microsoft Exchange 2007 that provide traditional cluster-like features without the cluster-shocker price. Disclaimer: the secret is in the hyphen. So lets look at what this new stuff does.

LCR, local continuous replication, allows you to create a seamless copy of your storage groups. Microsoft Exchange 2007 transaction logs have gone down to 1024 KB in size, and through use of log shipping and log replay, it’s very easy to maintain a second copy of your databases on another set of drives. There are two primary benefits: improved I/O (performance) and reduced disaster recovery interval. In laymans terms, you can create backups from the secondary copy of your storage groups without significantly impacting the primary set of drives. Additionally, if your database crashes at 10:00 AM, you can switch to the backup copy of your databases by 10:01 AM – not 3 hours later once the restore of the database from tape or USB drive has completed. You can read more about LCR here.

CCR, continuous cluster replication, relies on the same technology of log shipping and log replay with one important difference: it replays Exchange transaction logs on a different set of hardware. This way you can have a complete and total failure of your primary server and you can just switch to the passive node once the primary goes down. Sounds similar to the traditional Exchange cluster deployments, doesn’t it? Well, its very different! It does not require a full cluster setup with fiber or direct-attached SCSI storage, no quorums, no high end switches. Just two commodity servers will do. You can read more about CCR here.

Clusters are Expensive!

Yes they are. Clusters generally require very expensive shared storage devices, storage controllers to manage either SCSI or Fiber storage arrays, high end systems, etc. However, LCR and CCR do not require any of this high end gear. You can use the low cost Dell PowerEdge servers (currently Dell has a PowerEdge 440 Dual Core Intel 2.8, 2GB DDR, 160GB SATA2 on sale for $580)

To give you an example, last night I built two baseline test servers to test LCR and CCR features of Exchange 2007. Total with shipping came to under $900 (about $400 for each system). What’s in it?

AMD Athlon 64 X2 3800 Windsor 2.0 GHz AM2 Dual Core Processor ($91)

ECS Socket AM2 Radeon Xpress 1100 MicroATX AMD Motherboard ($58)

Wintec AMPX (2 x 1GB) 240-Pin DDR DDR2 800 (PC2 6400) Dual Channel Memory ($130)

Western Digital Caviar SE WD1600 JS 160 GB SATA 2 Drive ($53)

IN WIN Black Steel MicroATX Desktop Computer Case 240W ($50)

At the first glance this system looks very basic. Let’s look at some of the objections. First of all – no redundant storage. Easy, not needed. The nodes replicate data back and forth, so there is no need for a high end storage investment. If you wanted to spend extra $53 per node you could buy an additional hard drive and create a mirror. The motherboard supports it. Second, only 2GB of RAM? Yes, thats the recommended amount of RAM for the mailbox role according to Microsoft. Not the minimum, recommended. If thats not enough, the motherboard does support up to 16GB. You can overload the primary box with 4GB for example and keep the failover node with far less RAM because it would only be accessed when there is a failure. Third, motherboard and case. This is hard to overcome so let me explain my choices. I invested in a fairly high end processor for this setup and selected a cheaper case becasue as a test system this will sit on my workbench. The 240W power supply and low-footprint desktop system case will make this a cool, quiet and small setup that I can use to get things going. For roughly $80 more per node you can also buy a 1U Racmount case from SuperMicro that would meet your data center racking needs.

Either way, even if you bumped up the RAM, mirrored the hard drives, bought a racked case it would cost just over $1,000 total for both systems shipped. That is far less than what most small businesses are investing in their primary server – why? Because we’ve been trained to buy the most expensive solution the money can buy and budget can fit. We’ll grow into it, right? Well, no.. no.. wrong. You see, we’ve pushed the limits of processing very far. You get far more memory and processing power for the buck but you still have that same lingering issue of the single point of failure. And you’re likely spending more for that single point of failure than you would for a redundant configuration!

The point of this article is to shift your thinking. We have the technology. We have the resources. We can make the disaster recovery a lot more predictable and seamless for our clients if we’re willing to review these new techologies, implement them and get away from “more is better” to “more independant systems are better”. If you have a small business you have to let go of the big-iron enterprise view of having high end monster systems to a more distributed, more replicable smaller systems. It will cost you less and let you sleep better at night. Hey, take one node home with you 🙂