Comparing SQL and NoSQL approaches for clustering over big data

Filipe Assunsio,Manuel Levi,Pedro Furtado

doi:10.1504/ijbpim.2015.073657

Abstract

Data mining is the process of discovering patterns in large datasets. With the exponential growth of available information, new machine learning, statistics and other analytics techniques have to be developed to solve the processing needs required to do such analysis fast enough to be used successfully. In this study, techniques like cluster analysis are used over generated data in order to do customer segmentation, and the system performance is evaluated by measuring the processing time. The data used in the current paper is generated using the Star Schema Benchmark (SSB). Our main goal is to find a scalable solution to run data mining over a decision support benchmark. Four different systems will be tested: single node MySQL, MySQL cluster, Apache Mahout and R. By running MySQL cluster and Mahout, each system distributed by four nodes, the paper compares the performance of k-means run in parallel. MySQL and R will allow for comparison of this kind of execution against methods running on a single machine, both on relational and non-relational systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing SQL and NoSQL approaches for clustering over big data

Abstract

Talk to us

Similar Papers

More From: International Journal of Business Process Integration and Management

Lead the way for us

Similar Papers

SQL or NoSQL? Performance and scalability evaluation
Veronika Abramova ... Pedro Furtado
International Journal of Business Process Integration and Management | VOL. 7
Veronika Abramova, et. al.Veronika Abramova ... Pedro Furtado
01 Jan 2015
International Journal of Business Process Integration and Management | VOL. 7

Performance Evaluation of Mahout Clustering Algorithms Using a Twitter Streaming Dataset
Fatos Xhafa ... Adriana Bogza
-
Fatos Xhafa, et. al.Fatos Xhafa ... Adriana Bogza
01 Mar 2017
01 Mar 2017

Testing SQL and NoSQL approaches for big data warehouse systems
Rafael Almeida ... Pedro Furtado
International Journal of Business Process Integration and Management | VOL. 7
Rafael Almeida, et. al.Rafael Almeida ... Pedro Furtado
01 Jan 2015
International Journal of Business Process Integration and Management | VOL. 7

Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation
Fatos Xhafa ... Leonard Barolli
-
Fatos Xhafa, et. al.Fatos Xhafa ... Leonard Barolli
01 Sep 2016
01 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing SQL and NoSQL approaches for clustering over big data

Abstract

Talk to us

Similar Papers

More From: International Journal of Business Process Integration and Management