Abstract

Nowadays more and more datacenters cooperate each others to achieve a common and more complex goal. New advanced functionalities are required to support experts during recovery and managing activities, like anomaly detection and fault pattern recognition. The proposed solution provides an active support to problem solving for datacenter management teams by providing automatically the root-cause of detected anomalies. The project has been developed in Bari using the datacenter ReCaS as testbed. Big Data solutions have been selected to properly handle the complexity and size of the data. Features like open source, big community, horizontal scalability and high availability have been considered and tools belonging to the Hadoop ecosystem have been selected. The collected information is sent to a combination of Apache Flume and Apache Kafka, used as transport layer, in turn delivering data to databases and processing components. Apache Spark has been selected as analysis component. Different kind of databases have been considered in order to satisfy multiple requirements: Hadoop Distributed File System, Neo4j, InfluxDB and Elasticsearch. Grafana and Kibana are used to show data in a dedicated dashboards. The Root-cause analysis engine has been implemented using custom machine learning algorithms. Finally, results are forwarded to experts by email or Slack, using Riemann.

Highlights

  • Nowadays, data centers are increasing in complexity by utilizing different technologies together in order to accomplish more and more ambitious goals

  • Not conventional tools are required to monitoring the overall datacenter network and new advanced functionalities are required to support experts during recovery and managing activities, like anomaly detection and fault pattern recognition

  • Service malfunctions could be detected using the first source category but this information alone does not allow to figure out the root causes

Read more

Summary

Introduction

Data centers are increasing in complexity by utilizing different technologies together in order to accomplish more and more ambitious goals. Not conventional tools are required to monitoring the overall datacenter network and new advanced functionalities are required to support experts during recovery and managing activities, like anomaly detection and fault pattern recognition.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.