Abstract

The average volume of data produced daily is estimated to be over 2.5 quintillion byte. Moreover, by year 2020, it is estimated that 1.79MB of data will be created every second by each person in the world. Apparently, big datasets contain tremendous amount of valuable information that can be used for improved decision making. However, big data requires incredible amount of storage and computational resources for effective processing. Machine Learning (ML) algorithms are effective tools popularly used to analyze and extract concealed insights from datasets. However, some ML algorithms were not originally designed to handle big datasets, hence their computational complexity decreases with increase in data size. Consequently, this makes big data analytics extremely slow or unrealistic. Therefore, there is an obvious need for fast and effective techniques for big data analytics. This paper introduces an intelligent hybrid ML-based technique suitable for big data analytics (called EDISA_ML). EDISA_ML is a boundary detection and instance selection algorithm, inspired by edge detection in image processing. It was evaluated on four ML algorithms and big datasets, and the results show that it achieved a storage reduction of over 50% and simultaneously improved the training speed of the evaluated ML algorithms by over 93% (in some cases), without meaningfully affecting their prediction accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call