A new adaptive sampling algorithm for big data classification

Kheyreddine Djouzi,Kadda Beghdad-Bey,Abdenour Amamra

doi:10.1016/j.jocs.2022.101653

Abstract

The exponential growth of the quantity of data that circulates on the web led to the emergence of the big data phenomenon. This fact is a natural consequence of the proliferation of social media, mobile devices, the abundance of free online storage, and new technologies like the internet of things. Subsequently, big data has created several challenges to the computer science community, among which the large size of data is the most challenging. Traditional machine learning algorithms used mostly for insight extraction find themselves inadequate, even on high-performance computer architectures. For instance, big data analytics algorithms can overcome the size issue by either: (1) adapting the existing machine learning techniques to the scale of the big data; or, (2) by sampling big datasets, choosing randomly much smaller subsets of the data population, to meet what current algorithms can handle. In the present work, we aim to proceed through the second alternative to address the size challenge in the big data context. We propose intelligent sampling techniques based on Scalable Simple Random Sampling (ScaSRS) and Subsampled Double Bootstrap (SDB). Test results carried out on public generic datasets show that our proposal is able to address the size dimension efficiently. The proposed algorithms were evaluative on a classification task where the obtained results provided significant improvement compared to the state-of-the-art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new adaptive sampling algorithm for big data classification

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Science

Lead the way for us

Journal: Journal of Computational Science	Publication Date: Apr 1, 2022
Citations: 9

Similar Papers

Machine learning algorithms for Big Data analytics including deep learning
Shaveta Malik ... Rohit Sahoo
-
Shaveta Malik, et. al.Shaveta Malik ... Rohit Sahoo
24 Aug 2022
24 Aug 2022

Introduction to the Internet of Things and Big Data Analytics Minitrack
Frederick J Riggins ... Matthias Dehmer
-
Frederick J Riggins, et. al.Frederick J Riggins ... Matthias Dehmer
01 Jan 2015
01 Jan 2015

Technical challenges and perspectives in batch and stream big data machine learning
Kvsn Rama Rao ... Sivakannan S
International Journal of Engineering & Technology | VOL. 7
Kvsn Rama Rao, et. al.Kvsn Rama Rao ... Sivakannan S
31 Dec 2018
International Journal of Engineering & Technology | VOL. 7

Scalable machine‐learning algorithms for big data analytics: a comprehensive review
Preeti Gupta ... Arun Sharma
WIREs Data Mining and Knowledge Discovery | VOL. 6
Preeti Gupta, et. al.Preeti Gupta ... Arun Sharma
15 Sep 2016
WIREs Data Mining and Knowledge Discovery | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new adaptive sampling algorithm for big data classification

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Science