Abstract

The big data platform always suffers from performance problems due to internal impairments (e.g. software bugs) and external impairments (e.g. resource hog). And the situation is exacerbated by the properties of velocity, variety and volume (3Vs) of big data. To recovery the system from performance anomaly, the first step is to find the root causes. In this paper, we propose a novel signature-based performance diagnosis approach to rapidly pinpoint the root causes of performance problems in big data platforms. The performance diagnosis is formalized as a pattern recognition problem. We leverage Maximum Information Criterion (MIC) to express the invariant relationships amongst the performance metrics in the normal state. Each performance problem occurred in the big data platform is signified by a unique binary vector named signature, which consists of a set of violations of MIC invariants. The signatures of multiple performance problems form a signature database. If the Key Performance Indicator (KPI) of the big data application exhibits model drift, our approach can identify the real culprits by retrieving the root causes which have similar signatures to the current performance problem. Moreover, considering the diversity of big data applications, we establish an ensemble approach to treat each application separately. The experiment evaluations in a controlled big data platform show that our approach can pinpoint the real culprits of performance problems in an average 84% precision and 87% recall when one fault occurs, which is better than several state-of-the-art approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.