“Power supply unit at the center of the outage was in perfect working order and was deliberately shut down which triggered the disturbance. (The Times, 2017). It is understood that when the power shut down occurred, it was ” restored in an uncontrolled fashion” by the employee working at the data centre which damaged the entire system and caused it to shut down completely, according to Willie Walsh, CEO of International Airlines Group. The shutdown was a Human Error as an employee disconnected the UPS and failed to follow the procedure by connecting it back in an irresponsible manner. This is labeled as a process failure.
The problem for British airways was, when the IT crash occurred, there was a lack of adequate controls in place to deal with such a catastrophic event, which had huge ramifications for the company. A contractor of British airways working in the Datacenter was responsible for the system crash. It was a human error and a process error therefore, effective use of FMEA could’ve prevented the failure of the system from happening. The primary use of FDMA is to design a new process. A process could take the form of a new way for a team to operate new policies and procedures. It also could be used as a process of reapplication or design of a existing process, and is also used to detect those potential new problems which could have arisen from the changes or form failure in processes that have arose from not using FMEA effectively in the past. British airways failed to use FMEA effectively which caused the data centre shut down resulting in a huge blow for the airline’s image.
From this incident, it is possible for British Airways to conduct multiple FDMA to ensure it doesn’t happen again. One effective use FMEA could cover the process for onboarding contractors that work within the company’s systems to deal with an issue such as the system crashing accordingly.
Another FMEA could look at the security processes in place surrounding the UPS. It could critically evaluate why one sole contractor was able to solely disconnect the critical IT supply of the company. And also, the lack of supervision that was in the surrounding area which led to the problem escalating worse and worse to the point where the company had to shut down operations for 2 days.