Hadoop multi node cluster resource analysis

Kapil Pandey,Anand Gadwal,Prashant Lakkadwala

doi:10.1109/cdan.2016.7570925

Abstract

In the Computer System we have basically three types of Resources; they are Software, Hardware and Data. Data is the most important resource of computer system, because whatever computing we are doing is just because of data. Data Science deals with large amount of data to infer knowledge from the data sets to rationalize the information to achieve business values. Traditionally information was in structured form, means that was generally generated by business transactions. This information was easily process with our traditional data management system. In past few years the amount of digital data generated was grown exponentially. This large amount of data was unstructured, which cannot be processed and extracted efficiently from our traditional system. It includes Text files, sensor data, log data, web data, social networking data etc. The major reason behind the generation of this unstructured data is various applications used via internet e.g. Smart devices, web, mobile, social media and sensor Devices. For achieving the business goals this data is essential to mine. This large amount of unstructured data is called BIG DATA [1]. It's in large volume, its varying. There are various tools are available for processing of this large amount of data. Hadoop is one of the popular and efficient tool for the processing of Big Data. Hadoop provide a framework that allows us distributed computing and run tasks in parallel such that such type of complex data can be processed efficiently with respect to time, performance and resources. This paper covers the major resources used by Hadoop cluster.

Full Text