Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

João Antas,Jorge Bernardino,Rodrigo Rocha Silva

doi:10.3390/computers11020029

Abstract

COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.

Highlights

Polytechnic of Coimbra, Coimbra Institute of Engineering (ISEC), 3030-199 Coimbra, Portugal; Centre of Informatics and Systems of University of Coimbra (CISUC), 3030-290 Coimbra, Portugal; FATEC Mogi das Cruzes, São Paulo Technological College, Mogi das Cruzes 08773-600, Brazil
This work focuses on two main areas: SQL and NoSQL databases, and Data Mining
Orange Data Mining was connected to Microsoft SQL Server, and the Data Mining tests were performed, the audit trail controlled all queries that Orange needed to make in the database to perform the tests

Summary

Introduction

The human species has already witnessed several pandemics during its existence. A pandemic is an epidemic occurring on a scale that crosses international boundaries, usually affecting many people [1]. Data Mining algorithms were used in classification problems to extract insights from the collected data and develop a COVID-19 predictive model with suitable accuracy. We evaluate one SQL database, Microsoft SQL Server [11], and two of the most popular NoSQL databases, MongoDB [12] and Cassandra [13] This evaluation was performed using real COVID-19 datasets by comparing the different databases in terms of query runtime, RAM consumed, CPU percentage used, and data storage size. The first goal of this work is to create a COVID-19 database and mine it to extract insights from the data.

Related Work

SQL versus NoSQL Databases

Data Mining on COVID-19 Data

Data Mining

Algorithms

Naïve Bayes

Decision Tree

Random Forest

Logistic Regression

Data Modeling

Experimental Evaluation

Data Mining Experiments—Classification Tests

Score is the weighted average of Precision and Recall

SQL and NoSQL Database Experiments

11. Runtime for Query

13. Runtime for Query to

14. Querywere

15. Runtime

20. Runtime andtoCPU

Conclusions and Future

Findings

Dataset

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers (Basel, Switzerland)	Publication Date: Feb 21, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers (Basel, Switzerland)

Lead the way for us

Similar Papers

MongoDB NoSQL Injection Analysis and Detection
Boyu Hou ... Jigang Liu
-
Boyu Hou, et. al.Boyu Hou ... Jigang Liu
01 Jun 2016
01 Jun 2016

A Novel Approach to Improve the Performance of the Database Storing Big Data with Time Information
Murat Taşyürek
Balkan Journal of Electrical and Computer Engineering | VOL. 10
Murat TaşyürekMurat Taşyürek
19 Oct 2022
Balkan Journal of Electrical and Computer Engineering | VOL. 10

COMPARISON OF ANN METHOD AND LOGISTIC REGRESSION METHOD ON SINGLE NUCLEOTIDE POLYMORPHISM GENETIC DATA
Adi Setiawan ... Rachel Wulan Nirmalasari Wijaya
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17
Adi Setiawan, et. al.Adi Setiawan ... Rachel Wulan Nirmalasari Wijaya
16 Apr 2023
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17

Deep learning based diagnosis for cysts and tumors of jaw with massive healthy samples
Dan Yu ... Jiacong Hu
Scientific Reports | VOL. 12
Dan Yu, et. al.Dan Yu ... Jiacong Hu
03 Feb 2022
Scientific Reports | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers (Basel, Switzerland)