MapReduce is one of the essential programming models for parallel processing and distributed storage of enormous data sets. The default Hadoop implementation assumes that the executing nodes are homogeneous. Data Locality is an important feature that Hadoop introduced to improve the performance of the traditional MapReduce model. The key idea is to move the map task closer to the node where the actual data resides rather than transferring the vast data set near the computation. Data Locality helps in lowering the network congestion and improving performance. However, this practice fails when processing the data in a heterogeneous Hadoop cluster. In a heterogeneous setup, nodes with different computational capabilities pose a crucial challenge. Nodes with a faster processing capacity finish the job compared to the nodes with slower processing ability.This paper proposes a KNN based scheduler that focuses on speculative prefetching and clustering of the data. The process starts with speculative prefetching and then performing the KNN clustering on the intermediate map output before directing it to the reducer for final processing. The performance evaluation of scheduler performance is analysed by executing different workloads like WordCount, RandomText, RandomNum, and Sort. The results show that the proposed idea improves the performance of job execution