Abstract

Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT infrastructures in data centers using AI-supported methods and tools, e.g. for automated anomaly detection, root cause analysis, for remediation, optimization, and for automated initiation of self-stabilizing activities. Initial results and products show that AIOps platforms can help to reach the required level of availability, reliability, dependability, and serviceability for future settings, where latency and response times are of crucial importance. The human operators see the benefits, but also the risks of losing a control over the system while still being accountable for the AIOps-managed infrastructure. While automation is mandatory due to the system complexity and the criticality of a QoS-bounded response, the measures compiled and deployed by the AI-controlled administration are not easily understood or reproducible. Therefore, explainable actions taken by the automated system is becoming a regulatory requirement for future IT infrastructures. In this paper we address several important sub-aspects of the AI-Governance with focus on IT service and infrastructure management and provide a set of rules and levels of automation that precisely describe the shared responsibility between human operators and the AIOps-controlled administration. We aim at providing guidance, decision-support, and explainable processes for AIOps.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.