Abstract

With the increasing need to handle and store data, NoSQL database management systems (DBMS) have grown in popularity, including Cassandra. Cassandra stores data in rows much like a relational database, but it does not provide join operations. The general solution for each problem that requires join on tables is to de-normalize the tables. Nevertheless, we argue that the need for join operations is still possible in the case of new and unexpected requirements beyond the database design. This research aims at developing a library that provides join operations for Cassandra. We start with understanding how Cassandra works internally, its data form, and how to retrieve data from Cassandra. The feasibility and performance of possible join algorithms are then analyzed to determine which algorithm to implement. We conclude that hybrid hash join and nested loop join algorithms are two feasible options to implement join in Cassandra. We then build the library for join operations in Cassandra. The join operations implemented library can perform both inner and outer join operations on either equi-join or non-equi-join. Based on performance testing, the hybrid hash join algorithm shows a good performance on small to large data, while the nested loop join algorithm shows a slower performance on large data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call