Big Data and Machine Learning Integration: The Benefits and Research Issues in the Huge Data Processing

doi:10.35940/ijrte.b1281.0982s1119

Abstract

The generation of the data from individual member to MNC incurring more burden on the existing architectures. The current requirements of processing and storing huge data may not be suitable to the existing storage and processing techniques. The fundamental issue is kind of the data populated every second in the social media even reaching to peta bytes of the storage the processing of this huge data is another problem. Here the concept of big data comes into the picture,Hadoop is a frame work which is helpful to store huge amounts of the data and to process the data in parallel and distributed mode. The framework is the combination of Hadoop Distributed File System(HDFS) and Map Reduce(MR). HDFS is a distributed storage which allows huge storage capacity solves the issue of abnormal data population, whereas the processing of the data is taken by the Map Reduce which provides a versatile model of processing the huge amounts of the data. The other dimension of the current work is to analyze the huge amounts of the data which is beyond the scope of Hadoop based tools. Machine Learning (ML) is a class of algorithms provides various techniques to analyze the huge data in a better possible way. ML provides classification techniques, clustering mechanisms and Recommender systems to name a few. The importance of the current work is to integrate the Hadoop and R which in turn the combination of Big data and ML. The work provides the key benefits of such integration and future scope of the integration along with possible research constraints in the reality. We believe the work gives a platform to researchers so as to extract the future scope of the integration and difficulties faced in the process.

Full Text