Big Data Mining Techniques

Adeel Shiraz Hashmi,Tanvir Ahmad

doi:10.17485/ijst/2016/v9i37/85826

Adeel Shiraz Hashmi, Tanvir Ahmad

Open Access

https://doi.org/10.17485/ijst/2016/v9i37/85826

Copy DOI

Abstract

Objectives: The objective of this research work is to discuss the various techniques which can be used for mining of big data viz. sampling, incremental learning, and distributed learning. Methods: For this study, literature survey was done to identify the various techniques employed by different authors to handle large (and streaming) data sets. For each technique, one or more algorithm was chosen and applied on large data sets. The platform for each technique was standardized (R libraries were used for each algorithm). The algorithms were compared on accuracy and time-consumed. Findings: The findings of this research work which conform to the existing literature is that the distributed learning is the best approach in terms of accuracy and time-complexity, for large data sets. However, if the data sets are streaming data sets and we want to perform real-time analysis then sampling or incremental approach are better than distributed approach. Incremental approach provides better accuracy, whereas sampling reduces time-complexity. Novelty: This study is important in the sense that it brings all the three techniques together on a single platform, which hasn’t been done earlier.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Indian Journal of Science and Technology	Publication Date: Oct 13, 2016
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Big Data Mining Techniques

Abstract

Talk to us

Similar Papers

More From: Indian Journal of Science and Technology

Lead the way for us

Similar Papers

Scalable fuzzy clustering algorithms
Lawrence O Hall
-
Lawrence O HallLawrence O Hall
01 May 2008
01 May 2008

EEGVIS: A MATLAB Toolbox for Browsing, Exploring, and Viewing Large Datasets
Kay A Robbins
Frontiers in Neuroinformatics | VOL. 6
Kay A RobbinsKay A Robbins
01 Jan 2012
Frontiers in Neuroinformatics | VOL. 6

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Online chinese restaurant process
Chien-Liang Liu ... Tsung-Hsun Tsai
-
Chien-Liang Liu, et. al.Chien-Liang Liu ... Tsung-Hsun Tsai
24 Aug 2014
24 Aug 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big Data Mining Techniques

Abstract

Talk to us

Similar Papers

More From: Indian Journal of Science and Technology