Abstract

Apache Cassandra is an highly scalable and available NoSql datastore, largely used by enterprises of each size and for application areas that range from entertainment to big data analytics. Managed Cassandra service providers are emerging to hide the complexity of the installation, fine tuning and operation of Cassandra virtual data centers (VDCs). This paper address the problem of energy efficient auto-scaling of Cassandra VDC in managed Cassandra data centers. We propose three energy-aware autoscaling algorithms: Opt, LocalOpt and LocalOpt-H. The first provides the optimal scaling decision orchestrating horizontal and vertical scaling and optimal placement. The other two are heuristics and provide sub-optimal solutions. Both orchestrate horizontal scaling and optimal placement. LocalOpt consider also vertical scaling. In this paper: we provide an analysis of the computational complexity of the optimal and of the heuristic auto-scaling algorithms; we discuss the issues in auto-scaling Cassandra VDC and we provide best practice for using auto-scaling algorithms; we evaluate the performance of the proposed algorithms under programmed SLA variation, surge of throughput (unexpected) and failures of physical nodes. We also compare the performance of energy-aware auto-scaling algorithms with the performance of two energy-blind auto-scaling algorithms, namely BestFit and BestFit-H. The main findings are: VDC allocation aiming at reducing the energy consumption or resource usage in general can heavily reduce the reliability of Cassandra in term of the consistency level offered. Horizontal scaling of Cassandra is very slow and make hard to manage surge of throughput. Vertical scaling is a valid alternative, but it is not supported by all the cloud infrastructures.

Highlights

  • Today, data storage or serving systems such as Apache Cassandra and Hbase, Amazon SimpleDB and Dynamo, Google BigTable are playing an important role in the cloud and big data industry because the unprecedented high scalability and availability they achieve by means of data replication

  • – we provide a simple model to asses how the consistency level of a Cassandra virtual data centers (VDCs) is impacted by the auto-scaling and by the placement of vnodes on physical machines

  • From the analysis of the experimental data and of the literature we conclude that, for CPU bound workloads, the throughput for a Cassandra VDC serving requests of type li and running on a virtual machines (VMs) of type j can be approximated with a set of linear segment with slope δlki, j . δlki, j is the slope of the kth segment and it is valid for a number of Cassandra vnodes ni between nk−1 and nk

Read more

Summary

Introduction

Data storage or serving systems such as Apache Cassandra and Hbase, Amazon SimpleDB and Dynamo, Google BigTable are playing an important role in the cloud and big data industry because the unprecedented high scalability and availability they achieve by means of data replication Resource management for those data storage platforms is a challenging task and the complexity increase when multitenancy is considered. Minimisation of energy consumption is one of the strategies adopted to reduce costs, when the service providers run their own data centers To address this problem we propose three energy-aware auto-scaling algorithms (Opt, LocalOpt and LocalOpt-H) designed for Cassandra virtual data centers (VDC) running on a cloud infrastructure and we compare their performance. The average time to find the optimum using the Matlab MILP solver is about 50 s with a maximum of about 2 × 103 s

Research contribution
Paper organization
Related works
Reference scenario
Adaptation model
Workload and SLA model
Architecture model
Throughput model
Power consumption model
The optimal auto-scaling
Heuristics
Computational cost
Recommendations on the use of the auto-scaling algorithms
Performance evaluation methodology
Scenarios
Performance metrics
Setup of the experiments
Experimental results
Throughput surge
Physical node failures
10 Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.