Abstract

Classification technique in data mining concentrates on the prediction of categorical or discrete target variables which is designed to be handled by the classical C4.5 decision tree algorithm, an algorithm whose aim is to produce a tree which accurately predicts the target variable for a new unseen data. However its recursive nature poses a limitation when huge volume of dataset is involved; making computation more complex and resulting in an inefficient implementation of the algorithm in terms of computing time, memory utilization and data complexity. Meanwhile, several researches have been done to control these limitations. One of such improvements is the parallelizing of the algorithm using the MapReduce model. This involves dividing the large dataset into smaller units and sharing them on multiple computers for parallel processing, but the recursive nature of the algorithm makes the cost of computing large number of repeated calculations quite high, which is our concern in this work. . This research is aimed at reducing computation time further, by using a memoized MapReduce model, which involves the saving of the result of previous calculations in a cache; hence, when same calculations are encountered again, the cached result is returned, thus re-computation is avoided. The cached result is considered a reduced cost compared to the computational cost of re-computation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call