Abstract

In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.

Highlights

  • With the emergence of Big Data and the Internet of Things (IoT) and the increasing amount of semi-structured information generated daily, new technologies have arisen for storing, managing, and extracting information and patterns from such data

  • We further focus on two subcategories of DODBMSes with respect to the data model used to encode documents: i) DODBMSes that encode data using the XML format are Native XML Database Management Systems (XDBMSes), and ii) DODBMSes that encode data using the JSON format are JSON Database Management Systems (JDBMSes)

  • We present an overview and comparison of DODBMSes that encode information using XML and JSON formats and propose a benchmark using filtering and aggregation queries on a heterogeneous dataset

Read more

Summary

Introduction

With the emergence of Big Data and the Internet of Things (IoT) and the increasing amount of semi-structured information generated daily, new technologies have arisen for storing, managing, and extracting information and patterns from such data. The new technologies for storing data have been labeled with the name NoSQL and were initially developed to solve very specific problems. They provide different trade-offs and functionality (e.g., choosing high-availability over consistency) to be as generic as their counterparts Relational Database Management Systems (RDBMSes). Due to the semi-structured nature of data, NoSQL Database Management Systems (DBMSes) have been classified based on the data model used for storing information [1], i.e., key-value, document-oriented, wide column, and graph databases. We further focus on two subcategories of DODBMSes with respect to the data model used to encode documents: i) DODBMSes that encode data using the XML format are Native XML Database Management Systems (XDBMSes), and ii) DODBMSes that encode data using the JSON format are JSON Database Management Systems (JDBMSes)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call