Abstract

In this paper we evaluate a measurement-based approach to performance prediction of data-intensive applications over NoSQL systems. While the use of systematic measurements for building performance prediction models is a well studied topic, little attention has been paid so far on the application space of data-intensive systems using NoSQL databases. Measurement-based performance prediction approaches are often limited by a relatively narrow range of hardware characteristics available within each organization's private infrastructure. An opportunity to change this fact is the emergence of federated, publicly-accessible, large-scale research infrastructures, such as Fed4FIRE and GENI, featuring a variety of heterogeneous hardware. This paper demonstrates accurate measurement-based performance prediction modeling for NoSQL systems over such public infrastructures. We consider three prominent regression techniques: Multivariate adaptive regression splines (MARS), support vector regression (SVR), and artificial neural network (ANN) regression, applied to the YCSB data-intensive benchmark over the MongoDB NoSQL data store. Our measurements are drawn from 1-, 3-, and 5-node clusters of four node types. Performance prediction using MARS yields the best results with an average accuracy of 97.85% vs. 94.16% and 90.39% with SVR and ANN respectively. The approach can seamlessly extend to a wider range of hardware specifications available in federated research infrastructures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call