Abstract
The big data platform always suffers from performance problems due to internal impairments (e.g. software bugs) and external impairments (e.g. resource hog). And the situation is exacerbated by the properties of velocity, variety and volume (3Vs) of big data. To recovery the system from performance anomaly, the first step is to find the root causes. In this paper, we propose a novel signature-based performance diagnosis approach to rapidly pinpoint the root causes of performance problems in big data platforms. The performance diagnosis is formalized as a pattern recognition problem. We leverage Maximum Information Criterion (MIC) to express the invariant relationships amongst the performance metrics in the normal state. Each performance problem occurred in the big data platform is signified by a unique binary vector named signature, which consists of a set of violations of MIC invariants. The signatures of multiple performance problems form a signature database. If the Key Performance Indicator (KPI) of the big data application exhibits model drift, our approach can identify the real culprits by retrieving the root causes which have similar signatures to the current performance problem. Moreover, considering the diversity of big data applications, we establish an ensemble approach to treat each application separately. The experiment evaluations in a controlled big data platform show that our approach can pinpoint the real culprits of performance problems in an average 84% precision and 87% recall when one fault occurs, which is better than several state-of-the-art approaches.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have