Understanding the Cost of Data Center Downtime
April 01, 2020
Editor’s Note: This blog is an abbreviated version of the original written and published on Feb. 3, 2020 by our DCIM partner, Sunbird® Software.
In recent years, data center infrastructure has become significantly more reliable and management practices have improved, so it would be fair to expect that the number of reported downtime incidents is decreasing. But this isn’t the case.
According to a 2018 survey by Uptime Institute, 31% of respondents experienced a downtime incident or severe degradation in the last year and 48% reported at least one outage at their site or at a service provider in the last three years.
Downtime is expensive. It costs both time and money and can have grave consequences for organizations that are not sufficiently prepared. According to Gartner, downtime costs $5,600 per minute on average. This results in average costs between $140,000 and $540,00 per hour depending on the organization.
The number one cause of data center failure is human error. Other common causes are network failure, power outages, UPS system failure, natural disasters, and cyber crimes. Fortunately, there is a solution that helps prevent downtime.
Data Center Infrastructure Management (DCIM) software allows data center mangers to avoid unplanned downtime that can cost hundreds of thousands of dollars per outage and wreak havoc on your business. Some of the ways to prevent human error and maximize uptime with DCIM are:
Manage inlet air temperature and humidity. The temperature and humidity of air at the inlet of cabinets is important because this is the air that flows through the cabinet to decrease the heat. If the inlet air is too warm, the cabinet won’t cool properly. If the air is too humid, there is a risk of corrosion and damaged equipment. And if the air is too dry, there could be a static electrical discharge. All of which these can cause costly downtime. DCIM software collects data from environmental sensors in the data center and displays the information in business intelligence dashboards and 3D floor map visualizations to help you monitor your data center environment and identify hot spots.
Safely increase temperature. Increasing temperatures in the data center can improve energy efficiency, but it comes with the risk of overheating and damaging equipment, resulting in downtime. With DCIM, you can set temperature thresholds and receive alerts when temperatures are outside of your desired range. Similarly, DCIM will help you avoid overcooling to optimize efficiency and reduce energy costs.
Ensure power redundancy. Due to the increasing demand of computing hardware, data center cabinets are now packed more densely with power-hungry IT equipment. And since data center teams are often focused on fully utilizing existing resources and delaying capital expenses, they may not be aware that a cabinet is overloaded until it’s too late. This makes power redundancy in the event of equipment failure a critical component of any strategy to maximize uptime. DCIM software allows you to run a failover simulation report and identify what cabinets are at risk and what equipment can continue functioning safely if a PDU goes down. Data center managers can leverage this information to make necessary changes to the loads before there is a real failure.
Health polling. Ensuring that intelligent PDUs and other devices are operating properly and accessible via your network is important to maintaining uptime. It’s not impossible for equipment to go down without anyone noticing. A technician or engineer may place a PDU into maintenance mode accidentally, neglect to power on new resources, or connect equipment by the incorrect ports or cables. With DCIM software, you limit the possibility of outages caused by malfunctioning equipment by polling intelligent PDUs and other equipment at user-configurable intervals to ensure that they are accessible. If the device is not reachable, the software alerts you immediately so you are aware of the issue before there is a crisis.
With DCIM, you can simulate failover and test what-if scenarios with reports that identify available capacity to ensure coverage in case of failure, visualize data center and facility health status with a red-yellow-green color-coded health map that provides an at-a-glance view of rack load levels, line currents, and environmental conditions, and be alerted of threshold violations with automated emails that enable the quick identification of hotspots and potential trouble issues. With these capabilities, DCIM will help protect your infrastructure in the event of a data center disaster.
Data center monitoring is one of the most critical elements of maintaining the health and efficiency of your data center. Learn more about how it can help you improve the productivity of your data center team. Read the full blog.
Posted by Brittany Mangan, Digital Content Specialist at 4/1/2020 5:30:27 AM