Angry Vlad

System Admin
23 Comments

People often wonder what causes things like Vladfire and other acts where I pour more gasoline on the fire. I tend not to share what goes on in my professional life because, honestly, putting this type of stuff in writing hurts. Nobody likes to document their abhorent failures or reinforce that no matter how far we get in documenting and perfecting the process things tend to go horribly wrong.

Such is the case that happened yesterday. Let’s play a little game here of “Guess how this goes wrong”; Here is the scenario, one of the customers servers needs to be reprovisioned as they are ready to move to MySQL 5 and start implementing their old spaghetti php-mysql code as stored sql procedures.

Vlad: I would like to have one of the servers reimaged with CentOS 4.3, minimal install. The box in question is a small Dell SC420 server in … rack, it is the only one of the kind. It is labeled as … as far as I recall

Make the system one big ext3 partition filling up all available space (- swap of course)

Username: root
Password: …

Disable firewall
Disable SeLinux
Grub Boot Manager

IPADDR=…
NETMASK=255.255.255.128
GATEWAY=…

Connected to the … switch.

Please update the ticket when you unplug the server before you proceed – I want to make sure that the correct server is being reimaged and not a production one.

Now if you’re in IT you can probably see where this one is heading. So here is Marlon’s response:

Marlon: We will begain the install now and update you once the work is complete.

Thanks!

Ok, sounds like he didn’t read the last part of the ticket, doesn’t it? Maybe I’m misinterpreting what “update the ticket when you unplug the server before you proceed” – perhaps I’m also misinterpreting what “once the work is complete” means too, but I’m smart enough to clear it up. Here is my response:

Vlad: I just want to make sure the correct box is pulled off the rack – as far as I can tell this system is still online.

Now certainly Marlon will read this part and respond with something along the lines of  “Not a problem, will update the ticket when I unplug it and before the imaging starts” but instead I get one of these:

Marlon: Currently, we do not have CentOS 4.3 on hand. WOuld you like me to proceed with CentOS 4.2?

Am I talking to the wall here?

Vlad: Absolutely — but the server I wanted you to reimage is still online. Have you pulled it off the rack yet? I have to make sure that the correct one is pulled off because I no longer remember the label that was placed on it.

Now certainly at this point he understands what needs to be done here, case closed. Right? Right?

Kevin: Vlad,
The server should now be down. OS installation is in progress.

Umm. Not quite.

[root@… ~]# uptime
 14:29:11 up 20 days, 23:39,  0 users,  load average: 0.00, 0.00, 0.00–

Yup, They reimaged a production server. What more can you possibly say?

23 Responses to Angry Vlad

Comments are closed.