A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallelk-Means Algorithm in MapReduce Environment

Jianfang Cao,Yun Tian,Hao Shi,Min Wang,Guohua Hu

doi:10.1155/2016/3593975

Abstract

The rapid growth of digital images has caused the traditional image retrieval technology to be faced with new challenge. In this paper we introduce a new approach for large-scale scene image retrieval to solve the problems of massive image processing using traditional image retrieval methods. First, we improved traditionalk-Means clustering algorithm, which optimized the selection of the initial cluster centers and iteration procedure. Second, we presented a parallel design and realization method for improvedk-Means algorithm applied it to feature clustering of scene images. Finally, a storage and retrieval scheme for large-scale scene images was put forward using the large storage capacity and powerful parallel computing ability of the Hadoop distributed platform. The experimental results demonstrated that the proposed method achieved good performance. Compared with the traditional algorithms with single node architecture and parallelk-Means algorithm, the proposed method has obvious advantages for use in large-scale scene image data retrieval in terms of retrieval accuracy, retrieval time overhead, and computational performance (speedup and efficiency, sizeup, and scaleup), which is a significant improvement from applying parallel processing to intelligent algorithms with large-scale datasets.

Full Text