Benefits of NoSQL databases for portals &amp; science gateways

Matthew R Hanlon,Maytal Dahan,Patrick Hurley,Praveen Nuthulapati,Stephen Mock,Rion Dooley

doi:10.1145/2016741.2016780

Abstract

Portals and gateways are increasingly offering users complex interfaces to interact with massive data sets. As dealing with big data becomes more commonplace, portal and gateway developers need to readdress how data is stored and rethink the supporting infrastructure that enables quick and simple access and analysis of data. It is becoming evident that traditional, relational databases are not always the most appropriate solution to allow users on-demand access to big data sets. In this study we show that using non-relational, NoSQL databases such as key-value stores and document stores can offer large benefits in performance, accessibility, and availability. We present a use case from the TeraGrid User Portal that demonstrates solutions for processing and auditing user job data efficiently in order to provide users rapid access to this data.One of the goals of TeraGrid User Portal is to offer users and PIs detailed job statistics such as service unit (SU) usage and job history via the user portal interface. While building a portal application to analyze batch job data records in the TeraGrid Central Database (TGCDB), we quickly ran into stumbling blocks. The TGCDB has over 17 million job records from December 2003 through March 2011. Between January 2011 and April 2011 alone, there are over 2.8 million job records. This data is growing at an ever-faster rate and will continue to grow as new computing resources become available. Even properly indexed tables took longer than ideal to query and still be responsive in a portal application. The current solution to this was to cache the jobs query results and access those cached results in the portal. This solved the issue with the speed of the query, but did not address the problem of dealing with this massive data set. We still needed the rich query interface that a database provides.In order to solve our issues we looked at a two different options. First, we tested moving the TGCDB to a newer, faster machine than the one it currently runs on to determine how much of the bottleneck was due to aging hardware. Second, we tested migrating the jobs data off of the relational PostgreSQL TGCDB and into a key-value store using Apache CouchDB instead of the flat file cache we had been using. CouchDB is a document-oriented database that is queried using MapReduce. CouchDB also offers specific benefits for portals and gateways, providing a RESTful JSON API that can be accessed using HTTP requests.Our initial tests have shown that moving the TGCDB to new hardware can provide a query speedup of 3.7x on average for the job queries we tested. Querying the same data using MapReduce queries to CouchDB gave an additional 8.24x speedup for a total of 30.6x speedup over the current TGCDB on average. The huge speedups offered by CouchDB come at the cost of additional disk usage. CouchDB maintains B-tree indices on the document store as well as any defined queries or âAIJviewsâAI. These indices use a greater amount of disk than a relational database, but enables CouchDB to take full advantage of high-performance disks and file systems.We show that the increase in performance gained from using a data warehouse for certain large data sets can offer great benefits to building on-demand data analysis tools in portals and gateways. By identifying these large data sets such as the TeraGrid jobs data and migrating them to high performance data stores such as CouchDB we can make much more information readily available to users.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Benefits of NoSQL databases for portals & science gateways

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Case Study for NoSQL Applications and Performance Benefits: CouchDB vs. Postgres

-

08 Jan 2015
08 Jan 2015

Migration of Relational Database to Document Oriented Database
Anna Benchy ... Gibin George
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Anna Benchy, et. al. Anna Benchy ... Gibin George
27 Jun 2022
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

An overview of data integration principles for heterogeneous databases
Aleksandar Stojanovic ... Zeljko Kovacevic
-
Aleksandar Stojanovic, et. al.Aleksandar Stojanovic ... Zeljko Kovacevic
23 May 2022
23 May 2022

Improving the energy efficiency of relational and NoSQL databases via query optimizations
Divya Mahajan ... Ziliang Zong
Sustainable Computing: Informatics and Systems | VOL. 22
Divya Mahajan, et. al.Divya Mahajan ... Ziliang Zong
06 Mar 2019
Sustainable Computing: Informatics and Systems | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benefits of NoSQL databases for portals &amp; science gateways

Abstract

Talk to us

Similar Papers

Benefits of NoSQL databases for portals & science gateways