SC‐OCR: similarity‐based clustering and optimum cache replacement approach

Sabitha Malli Subramanian,Vijayalakshmi Soundarajan

doi:10.1002/cpe.3916

Abstract

SummaryBig data is a new term used to identify the large scale and complex datasets. Big data is now rapidly expanding in all science and engineering domains, owing to the fast development of networking, data storage, and data collection capacity. Big data mining is the capability of extracting useful information from these large datasets. Nowadays, the integration of cloud computing with the data mining for the big data mining process is a challenging task. In order to process the huge amount of data, it is necessary to concentrate the improvement on the big data computation. Most of the existing approaches use the MapReduce to compute the big data. The increase in the computational cost and memory consumption are the main drawbacks of the existing approaches. To overcome these limitations, this paper proposes a similarity‐based clustering and optimum cache replacement approach for big data computing applications. The job recovery process is initiated by copying the data in the cloud server and forwarding the data copy for further processing. Then, the job is divided into clusters based on the similarity‐based clustering approach. Finally, the cache concept is introduced with the optimum cache replacement algorithm to avoid repeated execution of the jobs by queue management. The proposed approach is compared with the existing Spark and Hadoop approaches. The proposed approach achieves better performance in terms of iteration time, query response time, job completion time, and clustering accuracy. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text