)!).*:[!-(:.).-+.*)

ExchangeDefender
Comments Off on )!).*:[!-(:.).-+.*)

Got one of these gems forwarded to me by one of our customers earlier tonight, asking how we went about filtering something like that:

Subject: )!).*:[!-(:.).-+.*)

Sy!m]b*oool F]D:E(G
Last 0.04
T ar ge+t 0.12
!

Now, you can pretty much guess that with the amount of email we get there is no single person sitting in front of the screen saying “SPAM, “NOT SPAM”, “SPAM” – even if there was one I’m sure they would make a mistake more often than the computer does. But fair question, how do we go about nuking something like the above?

I’m about to let you in on how we write rules to identify messages like the above.

First, look at the subject. What do you notice about it, just on the face of it. Just a bunch of characters, junk. Right. But look closely. What’s missing? A word. But how do I know whats a word and what isn’t? Simple, you look for a continuous set of characters A-Z or a-z or mix thereof. Do you see any? Nope? Ok, bump the score.

Second, and a lot less error-prone – whats else is interesting about that subject? Well, there are no alpha characters in it. As a matter of fact, if you’ve received a few hundred of these you’d notice that none of them have any alpha characters in them! What are the odds that a legitimate piece of email would have a subject that didn’t have a single alpha or numerical character in it and that it wasn’t empty? If they are good, you receive a lot of email from your dog or cat or whatever type of an animal crawls over your keyboard. This however can be error prone, for example, what if you’re dealing with a CRM system in China? What if you dealt with people that have no netiquette? (ever had a customer send you a message with a subject: “HELP !#$!%@#!%$!#^#!^!^!^!# me!”)

Now, on their own, the two rules can lead to some false positives, that is, legitimate message being flagged as SPAM. But the two rules with a relatively low score lead to a less likely false positive match if the subject was just garbled or the body got garbled. Together, the score is higher and we take our gamble on trapping it.

There, see how easy that is?

P.S. If it still has a garbled subject and a garbled body and you still receive a lot of messages like that during your workday you really need to find a better way to communicate with your cat while you’re at work. Teach her how to use IM or something