Big Data Analytics for Images in Public Cloud using Map Reduce on Local Clusters

Buvaneswari V.B*,M Pyingkodi,Dr S Shanthi

doi:10.35940/ijrte.d5303.118419

Abstract

MapReduce is a programming model used for parallel computing of big data in public cloud. Big Data have characteristics like variety, velocity and volume. The research work carries out MapReduce using Matlab which is a powerful image processing and numeric computation tool. The research considers unstructured image data in public cloud Dropbox as big data and applies MapReduce algorithm to map and reduce all the images stored in it. The research work aims to retrieve the images in public cloud with maximum Red, Green, Blue color and the colors that intersect between them. The same code is modified to find all Red, Green and Blue that supports more parallelism and aids in improving the speed of MapReduce by eliminating the dependency between iterations. The speed of parallel MapReduce shows considerable improvement only with increased file size and coding style. Parallel MapReduce computation is carried out with default workers, three and four workers of the local cluster with scale up architecture. This model is developed using Matlab and can be implemented in Hadoop as well.

Full Text