Written by:  

Tim Janssen

What if data could help in optimizing your IT environment?

Digital. Innovative, groundbreaking insights. Discovering new business opportunities. That is what often comes to mind when thinking of data. To a certain extent this is true – data can indeed be used to uncover interesting opportunities you haven’t thought of before. However, the benefits of using data can also be found closer to home. For instance, when it comes to optimizing your IT environment and gaining insights into its performance regarding costs, efficiency, and security. It can be challenging to stay in control and safeguard optimal performance. How can we use data for this?

In control of IT – using all data you can get your hands on

At Itility we manage quite some IT environments. And our passion is to automate everything away, and to use analytics to work smarter. Therefore, we love data to stay in control. Staying in control of our IT environment requires collecting data from multiple sources, combining them, and using statistics, machine learning and algorithms in order to generate useful insights.

In a typical IT environment (such as our own Itility Cloud Control platform), you will find physical servers, storage, virtual machines, network devices, databases, applications and so on. All of these systems collect their own data in the form of logs and metrics. And since these systems are interconnected, correlations exist among them. For example: a lack of storage can cause deteriorated application performance, which can have a business impact for business-critical applications. Combining multiple data sources can help in finding the root cause of such an issue.

All that data needs to be stored into a data lake to provide opportunities for analysis. For example, consider the crashed application due to a lack of storage. Instead of diving into this multifaceted problem manually, we have a monitoring solution in place that generates automated alerts based on the multiple events that took place: the lack of storage, an unresponsive virtual machine, and degraded application performance. It then decides to automatically contact one of our DevOps engineers to solve the issue. This is a ‘responsive’ or reactive way-of-working, a starting point.

More important to us is that we also use all that available data to prevent this problem from happening again in the future. To tackle this, we turn to advanced analytics and combine the historic data of the storage component with data science techniques to make predictions of its future state. This predictive or proactive way-of-working enables us to react to this issue before it actually occurs. Main benefit is that this proactive alerting ensures that IT issues do not result in business impact for our customers, such as the degraded application due to the storage issues. A nice side benefit for our own team is that no DevOps engineer must be awoken at night during stand-by because of an issue that could have been prevented.

But there is more benefit in using data for predictive IT-maintenance. It enables us to transform unplanned work into planned work, thus changing from fire-fighting mode to fire-prevention mode which is less stressful and more reliable for those involved. The predictive alerting is one example. Another example is using our data for capacity planning, to ensure we order the right capacity at the right time. We also use data for rightsizing: predicting CPU/memory usage for virtual machines and automatically downscaling or upscaling them just before it is needed. And we use data to simulate changes before we actually carry them out, thus predicting the impact of a certain change on user performance and making sure to plan additional changes along that one change.

An IT data lake to stay in control of your IT environment.


back

Want to stay updated?