Have you ever done something so stupid, that became apparent to you in such a way that you figured the level of stupidity you just experienced better be followed by a quick death so you can at least get the Darwin Award?
Here is how someone learned the hard way why I say: “We suck.”
Little bit of background: Few months ago one of my partners, and a really great friend, had one of his monkeys open a support ticket that really had something urgent in it. We updated it a few times under the auspices that we were troubleshooting the issue with no really useful data he could take to his client. We’ve learned better since then. Anyhow, the case finally boiled up to me and I fixed it but the mistake was so stupid, so idiotic, only I was dumb enough to have made the similar mistake in the past that I was naturally the only one that would even check it. After all, no sane / literate person would ever try something that dumb. So really, the triumph came from been there / done that which really just means “I am so much dumber than I appear, put a warning sign on me.” – Thing is, I was so ashamed to admit what the mistake was and I just updated the ticket saying: Fixed it, sorry, we just suck.
This was apparently quite a hit at his office, as his engineers used it for weeks on end. I got to hear it first hand at the conference and it apparently earned me quite a reputation over there. Got a Vlad problem? They suck.
The real problem was that even though we fixed it, he had nothing to take to the customer to justify the outage. What do you go back and say? “Sorry, the people we sent you to suck. But its all good now..” Of course not.
Fast Forward To Today..
Earlier today, during DFWVF and me just getting ready for my afternoon ride I get a phone call, on my cell phone nonetheless, with the partner I just told you about saying the following:
“Dude, I need to cash in a favor. I have a screaming customer on the line and ….”
So I ask him about the ticket, I look at the work request, I look at the order, I look at the configuration. Looks good. I mean, everything by the book, I do the nslookups, everything looks right. I tell him its all good on this end, let me see what more thorough troubleshooting turns up and ask him to give me a call back.
His engineer updates the ticket, gives me bounce details, boss calls back and says: “It happens in your cloud.” FMR.
So, I isolate a node and start troubleshooting. Is it in the access lists, in relay lists, in routing tables, in ExchangeDefender. grep for “pattern”, find it, cut and paste, start testing. (more skilled of you are probably seeing where this is going… neither of us did).
I try everything under the sun, it works, I call him insane and ask him to do a run on his side and I’ll tail the logs on this end and see if I can catch it. It takes foreeeever for it to hit our outbound relay so I sneak in a cheap shot about virtualizing Exchange… don’t do it kids.
Anyhow, I see his message go through and bounce. Wtf? Ok, clientadmin@<paste in my domain>… Test passes, no bounce.
WTF am I doing wrong. “Dude, I can’t replicate this. It’s broken, but you’re the only one that can break it. “ – His response: “Yeah, I hear that a lot lately.”
Fast forward 14 minutes later, I look in every table, database, config file. What I am cutting and pasting, which is identical to what he is doing, works. When he does it, it fails. FMR. How? Then I finally decide to test with his email. Cut & paste the entire thing. Poof, NDR! WTF?
Let’s say for the sake of the argument that the name of the domain was vladh2o.com.
The problem? Customer had both vladh20.com and vladh2o.com. They only submitted h2o.com one, not the h20.com one. All this time I was copying and pasting the one from our database, from our config files, from our routing files… and it worked and I’ll be damned if there is a big difference between vladh2O.com and vladh20.com.
Yes, reading is fundamental. And, being from the great State of Florida, no child left behind -1, I failed that test.
Thats where you lost da ball game..
When I finally figured out what actually went wrong I was on the line with the boss and I just could not believe it. I didn’t tell him immediately, but I eventually explained. By my count we both spent an hour of each others time troubleshooting our lack of literacy. Add on to that however much time our guys spent. That was perhaps the most expensive support ticket ever entered in the system, I don’t know how much he makes but as TI says “imagine a lot.”
So I go back to my portal to update the ticket and thats where, as my buddy Los says, I lost da ball game. I said something so dumb, so cheap, so low… that I will have my ass handed to me for it for as long as these people are alive. The ticket update was:
“Yeah, ignore this please. We were troubleshooting a domain that wasn’t added to ExchangeDefender to begin with. They added ****20.com but not ******2o.com which is the one that wasn’t working to begin with.
We need to pick some font that makes 0’s and o’s stand out a little bit better.
Also, it appears that *****2o.com accounts have been provisioned, thanks for the giddyup on that.”
No, dear Vlad, there is no font that can fix stupid. Oooooooooof.