On Friday I actually got into an argument with one of our partners about what is causing an issue with the sudden stop of mail flow from ExchangeDefender. It immediately prompted the “My name is Vlad and I’ll bet you $100 that this will fix it” support policy for anyone that wants to argue with me and consequently, “Vlad is no longer allowed to call partners to help them with technical issues” policy.
There is this new thing in Exchange (new as in it’s been there for 4+ years) called Backpressure. It’s documented here in great detail. In a nutshell:
Exchange 2007 and beyond comes with a self-monitoring system called backpressure that will either temporarily or permanently stop the hub transport role. It monitors memory and drive space. If you start running out of either, Exchange will either temporarily or permanently stop accepting inbound mail.
Here is what it looks like from the outside:
telnet 220.127.116.11 25
Connected to clientserver (18.104.22.168).
Escape character is ‘^]’.
220 clientserver Microsoft ESMTP MAIL Service ready at Fri, 23 Jul 2010 12:20:28 -0400
250-clientserver Hello [22.214.171.124]
mail from: firstname.lastname@example.org
452 4.3.1 Insufficient system resources
Note: In order to check for inbound mail problems you should be using an SMTP diag. Your Exchange will still be functioning when the backpressure brakes kick in.
If you’re an SBS user, make sure you have at least 2x RAM (or at least 10GB free) disk space free on the volume on which Exchange resides. If that is not immediately possible, turn off backpressure and restart your Exchange Hub Transport services. If you’re not on SBS and have a real Exchange setup with proper separation between your log / db / queue storage separation, make sure you take free space available on the volumes which hold your queues and your transaction logs.
Case 1: Infrequent Email Delays
Exchange clients who typically only complain about email delays during business hours, or have sporadic email delay issues are likely dealing with a low memory issue. As the server gets more and more abuse throughout the day, it is likely to exhaust all available memory and Exchange backpressure stops processing inbound mail temporarily.
When it does so, the senders are greeted with the 4.5.2 4.3.1 Insufficient System Resources error message above. The message isn’t bounced / returned, the sending mail server will attempt the delivery again in the next few minutes (depending on configuration, server software, etc). ExchangeDefender is set to pound your server every 1 minute.
Case 2: Frequent & Persistent Email Delays
This is related to the backpressure being triggered by low disk storage availability. Start nuking stuff. At best, you’ve just downloaded too much stuff and you’re physically out of space until you delete it. All mail flow will stop until you address the issue.
The more exotic event, in which you have something that temporarily stores data on your server that also holds your queues and transaction logs, find whoever hired you and have them hold your head in the toilet while they persistently flush and slam the toilet seat on your neck until you stop convulsing. Since that’s technically murder, you might have to do this on your own, make sure to put a heavy weight on a toilet seat.
The more exotic event is particularly frustrating because the delays are compounded. We had a partner whose client used the same volume for his backup jobs as well as for Exchange. At the end of the day he’d exhaust nearly all the server space, thereby shutting down Exchange – once the backups were moved to the external device the space was available again and the inbound mail resumed. Another had clients rendering software run on the server, which had a 10GB rendering scratch allocation on C:\. I’ll give you one guess where the queues were. See the toilet seat fix recommended above.
What about compounding? Well, if you have resource issues and are a heavy user of email, inbound mail itself will cause delays. There are only so many messages that Exchange hub transport can route at once so a sudden surge of mail can trigger delays all by itself.
In a nutshell
1. Don’t keep your queues and logs on the same drive.
2. If you can’t comply with #1, make sure you have a ton of ram and hard drive space.
3. Make sure to check out Exchange 2007 Mailbox Server Role Storage Requirements Calculator: http://msexchangeteam.com/archive/2007/01/15/432207.aspx
4. If you are an MSP, and aren’t monitoring the free hard drive space on your servers if it dips below 10GB (again, Vlad’s toilet seat fix is highly recommended) at least monitor MSExchangeTransport EventId 15002.
5. For temporary relief only, turn off backpressure.