Avoiding Fatigue related errors
More and more these days I'm seeing IT folk feeling pressured to work longer and longer hours to fix problems when an outage hits or when a project is late. This sort of thing is often seen as necessary when a problem hits and, in the case of a late project, it's almost seen as expected by the business and even as part of the normal duties.
It needs to stop. IT people are just as bad at attempting to live up to the stereotype of some sort of superhero there to help no matter the hour. While commendable, everyone needs to understand that this is just as unhealthy as drinking excessively and, eventually, such levels of fatigue will lead to a major problem, hence the term "fatigue related error".
The gitlab outage at the start of the year is a clear case of fatigue related error. While Gitlab have been very open about the outage it is somewhat telling that the "problems encountered" section completely ignore fatigue as a causal factor.
The trend I'm seeing is that the IT industry seems to be closely following the aviation industry (more on that in a later blog but the parallels are fascinating), one of the biggest causes of accidents in aviation is pilot error. This is often caused by fatigue. Pilots are limited to a 12 hour day, sometimes 15 if the Captain decides to exercise an option often called "discretion" as it is at the Captains discretion to use it or not.
In IT, an outage often occurs during the middle of the night or just as we are about to go home, project roll outs are often scheduled for the end of a long week and, when things go bad there is an overwhelming need to do something, ANYTHING to try to fix things. More often than not, this causes even more issues and leaves people fighting to get things back to how they were when they were originally broken.
Everyone in IT needs to understand that it's ok to say, "We've been awake too long fatigue is a risk", companies need to be supportive of a healthy work-life balance. They need to understand that if an outage caused the IT dept to be up for 20 hours, they are going to be light on the ground for a few days.
The ability to speak out about concerns around fatigue needs to be encouraged, not shouted down as a weakness. It's time for the industry to grow up about such things.
Subscribe to Ramblings of a Sysadmin
Get the latest posts delivered right to your inbox