Big Data Analytics with Apache Hadoop MapReduce Framework

L Greeshma,G Pradeepini

doi:10.17485/ijst/2016/v9i26/93418

Abstract

Huge amount of data cannot be handled by conventional database management system. For storing, processing and accessing massive volume of data, which is possible with help of Big data. In this paper we discussed the Hadoop Distributed File System and MapReduce architecture for storing and retrieving information from massive volume of datasets. In this paper we proposed a WordCount application of MapReduce object oriented programming paradigm. It divides input file into splits or tokens that is done with help of java.util.StingTokenizer class. Output file is represented in the form of <, value>. The experimental results are conducted on Hadoop framework by loading large number of input files and evaluating the performance of Hadoop framework with respect to MapReduce object oriented programming paradigm. In this paper we have examined the performance of the map task and the reduce task by loading more number of files and read-write operations that are achieved by these jobs. Keywords: Hadoop, HDFS, Job Tracker, MapReduce, NameNode

Full Text