Abstract

The era of big data has began. Although applications based on big data bring considerable benefit to IT industries, governments and social organizations, they bring more challenges to the management of big data platforms which are the fundamental infrastructures due to the complexity, variety, velocity and volume of big data. To offer a healthy platform for big data applications, we propose a novel signature-based performance diagnosis approach employing MIC invariants between performance metrics. We formalize the performance diagnosis as a pattern recognition problem. The normal state of a big data application is used to train a set of MIC (Maximum Information Criterion) invariants. One performance problem occurred in the big data application is identified by a unique binary tuple consisted by a set violations of MIC invariants. All the signatures of performance problems form a diagnosis knowledge database. If the KPI (Key Performance Indicator) of the big data application deviates its normal region, our approach can identify the real culprits through looking for similar signatures in the signature database. To detect the deviation of the KPI, we propose a new metric named unpredictability based on ARIMA model. And considering the variety of big data applications, we build an ensemble performance diagnosis approach which means a unique ARIMA model and a unique set of MIC invariants are built for a specific kind of application. Through experiment evaluation in a controlled environment running a state of the art big data benchmark, we find our approach can pinpoint the real culprits of performance problems in an average 83% precision and 87% recall which is better than a correlation based and single model based performance diagnosis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call