Friday 29 August 2008

Ron Brunton was so right about the ABC lying to him

I read this post over at Bolta about Ron Brunton and his "interesting" time on the ABC board, and when I read it, I thought - "Yep, been there and seen that".

I used to work in the IT department of a large government organisation. I was in what I guess might be termed the middle level of middle management. Senior enough to see and contribute to a few board papers, technical enough to still mix it with the frontline Dilberts.

We had a complete network catastrophe at one point, thanks to a worm that flooded every network link with traffic. The disaster was so bad, some distant country sites were off the air for over a week. Everyone in IT had to work stupidly long hours to clean it up, and even major sites like head office were down for a few days. Thankfully, it didn't get into what I would call our "operational" network - if it had, Sydneysiders would have had a rough week.

This story might take a while to tell, but in the end, you'll understand what Ron Brunton was on about.

The company in question had offices and sites all over the place, and we also had many server rooms scattered around the country. I was working back a bit one night - kind of the last one there - when the phone rang. It was the Network Manager. His monitoring system had picked up a spike in traffic from the site that I was at, and he was ringing in the hope that a Dilbert was still there and could look at it for him.

Well, a Dilbert was there - me. At this point, it was maybe 6.30pm. He asked me to look at a few servers, which I did, and I couldn't see anything obviously wrong. The network traffic buildup was getting worse, so he hot-footed it over from the site that he was at to my site. We spent about 10 minutes poking around and throwing possible theories about the cause at each other, but none of them panned out when we ran a few tests.

I finally suggested that we might have been hit by the blaster worm. That was a longshot, as the worm had been around for a while at that point, it was well known, and we had been told in one management meeting after another by a certain window licker that every system had been patched to protect us against this worm. Our boss had raised patchaing against worms as an issue in meeting after meeting since the initial devastating release onto the internet of the slammer work, and we had been reassured time and time again that the patch had been applied. We'd been hit by the slammer worm earlier in the year, so patching our systems was top priority.

At that time, there was no anti-virus protection against the worm. The anti-virus software companies were working flat out on a solution, and until they released one, having the latest anti-virus software with up to date virus definitions was no good. The only way to keep it out was to apply the patches from Microsoft.

Now my timeline might get a little tangled here, as it was 5 years ago, and I went through that period with very little sleep. But essentially it started in that server room with me and the other manager trying to figure out what the hell was going on. We'd had yet another management meeting that morning, where the window licker had assured us that all systems were patched, and stupidly, we had all believed him without checking for ourselves (he had a history of being economical with the facts).

So we rang him on his mobile, and got yet another assurance that everything - and I mean everything - had been patched. We had a system for automatically pushing out patches, and he told us that it had been merrily pushing it out since the patch was released. After that, I am pretty sure we then rang our boss to let him know that we had a problem. When we suggested a worm, he poo-poo'd it as we had been repeatedly told that the patch had been pushed out, so it just wasn't possible. That discussion went on for a while as the network manager tried to get approval to apply a meat cleaver to the network to isolate the problem - that was a drastic measure, and not one taking lightly.

Me, be the curious type, decided to test the assertion that we were fully patched. It's quite easy to check a computer to see what patches have been applied. I started at one end of the server room, and checked the servers one by one. The other manager started at the other end of the room. After 5 minutes, we came to the depressing conclusion that not one had been patched.

Not one.

So we rang fuckwit back, and asked him to clarify. We had a whole bunch of server rooms - let's call them A through G - and we were in room C. We asked if he meant had he patched perhaps only all the servers in room A, which was the biggest. No, he responded, all were done. He was quite adamant about that. He had a system for pushing out the patches, and he said it had done its job (although he never provided the reports that we asked for in meetings showing what percentage of systems had been patched - which the system could provide).

That's when my rather fiery colleague exploded and told him that of the 70 servers in room C, none had been done. The worm was in our network, and spreading fast. By the time that call concluded, network links were becoming so congested, we were unable to remotely access other sites (the worm blasted out a huge amount of traffic, which meant that a few infected machines could quickly choke the network).

In short, we were fucked. One way to stop the infection spreading was to take sites off the air, but that meant getting into the network equipment remotely to shut the links down - and by then, the links were too choked to get into them remotely. We couldn't quarantine the uninfected sites. It was too late. If dickhead had told us at the start that he hadn't done any patching, we might have realised straight away what the problem was, giving the network team time to quarantine the infected sites and prevent a disaster from breaking out. As it was, his continual lying screwed us royally.

Afterwards, I figured that I should have known he hadn't done a damned thing. We had a monitoring system for our servers that detected when they were down, and installing the patch required a reboot. That would register on the monitoring system, and it had not picked up a reboot since the patch came out. He hadn't done anything because the reboots could only be done out of hours, and he was not the sort of guy to stay back until 11pm patching and rebooting servers, or to come in on Saturday and do it. I usually worked on Saturday, and I had never seen him in the office on a weekend. The penny should have dropped much earlier.

The other thing about applying patches back then is that on some occasions, they screwed the server you were patching. Any server administrator was stuck between the rock of getting wormed and the hard place of having angry users abusing you because one of the servers you patched just died and refused to restart. I don't know if he was too chicken to face an occasional dead server, or whether he was just plain lazy, but whatever his problem was, he lied to his fellow managers, and he lied to our boss. And he lied about this again and again and again.

But we were too busy to rip his head off - we had a network to clean up. The first thing we had to do was patch, by hand, every bloody server. Because every network link was choked, we couldn't download the patch from a central server like we normally did. We had to copy the patch onto a floppy or CD and go from server to server, applying it like we did back in the stone age.

The desktop staff had to visit every single PC in the company and do the same - there were over 5000 of them, scattered over more than 200 sites right across the country. Blokes got in their cars the next morning, packed enough clothes to last them the week, plus a CD with the patch, and simply started driving. One went north, another south and the last went west. They got home over a week later, utterly knackered, after visiting every two-bit outpost in our far-flung empire. They were less than impressed. Their users, many of whom were off the air for a week, were also less than impressed. We normally had a big backlog of work - that was now even bigger because we'd lost over a week, and were so exhausted, a few days off were required before normal operations could start again.

Fuckwit tried to be nice to everyone, but he got the cold shoulder. I think someone told him to lie low for a while.

But here is the kicker, and for my comparison with the ABC.

When it was all over, my boss had to write a report explaining what had happened. I thought it was pretty simple - fuckwit had failed to do his job and lied about it in our weekly meetings, then he'd lied about it when the infection broke out, and he continued to lie about it until we laid out the evidence of his slackness in front of our mutual boss. The only way to convince our boss was to take a screen dump from an unpatched system and stick it in front of him, and then tell him that was typical of all our servers.

Our boss tore fuckwit to shreds, and he finally admitted that maybe he meant he had only patched the servers that sat under his desk!

Our boss was a very loyal bloke - perhaps too loyal - for he proceeded to write a report that whilst technically feasible, failed to mention any of the above. It talked about many other organisations being hit, like banks and so on, and the fact that no antivirus protection existed at the time (antivirus fixes were released a day or two into the outbreak, but we couldn't push them out due to the network links being choked, and you still had to patch the server or PC before the fix was effective) etc etc etc. It talked about all the hard work our staff did to clean it up. But essentially, it was a crock.

I read the report and exploded. I put my feelings on the record by sending our boss an email saying that I could accept a lot of things that fuckwit had done, but I could never accept blatant lying. How could he expect me to work with someone that I could no longer trust? For that reason alone, I wanted the bastard sacked. He'd lied to us, he'd lied to our boss - that could not be tolerated.

My boss quietly ignored my rant, plus the rants from my other angry managers and our furious staff (who all knew the truth), and submitted a report that pretty much whitewashed the whole affair. For all I know, that report made it all the way to the Premier, since our organisation was an "essential services" one. It was really sensitive stuff.

How "sanitised" was the report?

There is a scene in Buffalo Soldiers where a tank crew shoot up heroin, and then go on a wasted rampage through a town before finally destroying a petrol station. The tank drives over the petrol pumps, crushing them, which results in petrol spraying everywhere before igniting in a huge fireball. The tank drives off, with the crew being none the wiser.

Imagine writing a report that said that the petrol station was destroyed by a "stray spark", whilst ignoring the fact that a heroin-addled crew drove a tank through the station in the first place. The report was truthful, so far as leaving out the bit about the tank and the heroin gave you the full and complete picture about what actually happened.

Why did no one speak out?

I have no idea. Many of our users knew the truth - they'd ask the guys who came to fix their PC what was going on, and they didn't hold back from telling them the facts. Pretty much the entire company soon knew that fuckwit had fucked up, and many knew that he'd lied about it. Many that I spoke to just shrugged their shoulders when I mentioned fuckwit, since they'd had dealings with him, and they knew what he was like. This was simply added to the catalogue of "misfortunes" that he had inflicted on the company and those he worked with.

So, what happened in the end?

Fuckwit didn't get sacked - he got a pay rise.

Most of us involved in that fiasco later left the company. Fuckwit was retained. As far as I know, he is still there.

Ron Brunton doesn't know half of what goes on down in the engine room, and the crap that management spins from time to time.

No comments: