Abstract

Today, scalable and high-available NoSQL distributed databases are largely used as Big Data platforms. Such distributed databases typically run on a virtualized infrastructure that could be implemented using Hypervisor-based virtualization or Container-based virtualization. Hypervisor-based virtualization is a mature technology but imposes overhead on CPU, memory, networking, and disk. Recently, by sharing the operating system resources and simplifying the deployment of applications, container-based virtualization is getting more popular. Container-based virtualization is lightweight in resource consumption while also providing isolation. However, disadvantages are security issues and I/O performance. As a result, today these two technologies are competing to provide virtual instances for running big data platforms. Hence, a key issue becomes the assessment of the performance of those virtualization technologies while running distributed databases. This paper presents an extensive performance comparison between VMware and Docker container, while running Apache Cassandra as workload. Apache Cassandra is a leading NoSQL distributed database when it comes to Big Data platforms. As baseline for comparisons we used the Cassandra's performance when running on a physical infrastructure. Our study shows that Docker had lower overhead compared to the VMware when running Cassandra. In fact, the Cassandra's performance on the Dockerized infrastructure was as good as on the Non-Virtualized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call