So, you made a big mistake at work… take a deep breath and read on. Many of us have made some really big mistakes at work — mistakes we like to call RGEs. A resume generating event, or RGE, is a mistake so big that you need to get your resume out quickly because you may have just cost yourself a job. I’d like to think that everyone only has one of these moments in their work history, but they can actually be a great learning tool. You just have to know how to bounce back. Have you experienced one of these mistakes? Here’s a good real life example to give you an idea of what I mean.
Years ago I worked with a gentleman who had previously worked for General Motors as a Linux administrator. This guy was a Linux whiz…but he wasn’t always. There’s a point in everyone’s career that you are a newbie, and very green to the IT field. This is generally where and when these mistakes take place. He was asked by a manager above him to delete all the contents of a directory and create a new one. Working from ‘memory’ he quickly typed “rm –rf” in the command prompt not realizing he wasn’t in the directory that his manager wanted him to remove. What happened next was life changing for him. As he told me, he could see all of the lights on the factory floor start to shut down, and all of the machines come to a stop. After all was said and done, GM lost about $17 million dollars, as it was estimated that they lost $1 million dollars a minute and it took them that long to get back up.
So, how do you bounce back from a mistake like this? More importantly, what are the first steps to triage this mistake and get things back up and running? I’m going to lay out a few things you can do to stop the bleeding quickly and bounce back from a huge mistake – even a multi-million dollar one.
1. Take Ownership of Your Mistake ASAP
What I believe is most important in a situation where you have made a massive mistake, is to immediately take ownership if in fact you did mess up. This does two things for you. First, it oddly helps to calm you down and direct your focus to remediating your mistake. Second, it shows your manager that you are forthcoming and honest and take responsibility even if it could cost you your job. Act quickly by taking ownership of your mistake and move on to figuring out how to fix it.
2. Alert Anyone and Everyone Who Will Be Impacted
Triage, triage, triage! The first thing that happens when a first responder comes on the scene of a disaster is triage. Figure out who is affected and decide who is affected the most. Get your manager on the phone or run to their office. Alert your colleagues and get several sets of eyes on the problem. With your manager and colleagues, begin to decide what service/application/server will be affected the most and contact the stakeholders of that service. Educate them on what happened, why it happened, your plan of action to bring it back up — and most importantly, how long you expect the service/application/server to be down. Performing calmly in this situation is a key skill you need to learn so that if someone else makes a similar mistake you can help out, or if you make another big mistake you know how to react productively.
3. Don’t Jump Ship!
If you made the mistake, you need to stay with it until it is resolved. This could result in a 24 or even 36-hour shift to get things back to operating. Getting frustrated, embarrassed and shrinking is not an option at this point. It’s time to focus and show your willingness to stick with the issue until it’s resolved. There may be times where you can’t fix it and it has to be escalated to another group of individuals to bring it back up. This isn’t an invitation to check out and get some sleep. Keeping communication open and being there for questions the next group might have shows resolve.
4. Write a Detailed AAR
An after action report or AAR is crucial in diagnosing what went wrong and how it was resolved. The language should read something like this, “At 4:16 pm on March 10th, I received a verbal request to delete XYZ directory and all files in it. At 4:23 pm, I logged into the machine via command line and issued a command to remove the directory, however I wasn’t in the XYZ directory at the time, which resulted in a complete loss of…” Giving tons of details and timestamps shows that you have a firm grip on what happened, why it happened and what steps you took to resolve it, including all parties involved in resolving the issue. At the end of the AAR, it’s imperative that you include a “Next Steps” section, if in fact there are still next steps in order to achieve stability.
We all make mistakes, and if you’ve been in the tech industry for as long as I have, sometimes you’ve made multiple mistakes (Disclaimer: I’ve never worked for GM. It wasn’t me). When you make a mistake, the way you respond immediately after realizing it will show those you work with and future employees what they can expect from you and your work ethic when a crisis happens. So when you are in that interview for that job you’ve always wanted and they ask you the inevitable question, “Tell me about a mistake you made, and how you resolved it” you’ll have confidence to answer the question and show how you learned and have grown from that experience.
It’s ok to make a mistake, but be prepared to take responsibility and make it right.