DEVELOPMENT OF AN ENHANCED C4.5 DECISION TREE ALGORITHM USING A MEMOIZED MAPREDUCE MODEL

Florence Paul,D Elaoyi Paul,F Armand Donfack-Kana,A Afolayan Obiniyi

doi:10.33003/fjs-2023-0705-1691

Florence Paul, D Elaoyi Paul + Show 2 more

Open Access

https://doi.org/10.33003/fjs-2023-0705-1691

Copy DOI

Journal: FUDMA JOURNAL OF SCIENCES	Publication Date: Nov 4, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Ahmadu Bello University

Abstract

Classification technique in data mining concentrates on the prediction of categorical or discrete target variables which is designed to be handled by the classical C4.5 decision tree algorithm, an algorithm whose aim is to produce a tree which accurately predicts the target variable for a new unseen data. However its recursive nature poses a limitation when huge volume of dataset is involved; making computation more complex and resulting in an inefficient implementation of the algorithm in terms of computing time, memory utilization and data complexity. Meanwhile, several researches have been done to control these limitations. One of such improvements is the parallelizing of the algorithm using the MapReduce model. This involves dividing the large dataset into smaller units and sharing them on multiple computers for parallel processing, but the recursive nature of the algorithm makes the cost of computing large number of repeated calculations quite high, which is our concern in this work. . This research is aimed at reducing computation time further, by using a memoized MapReduce model, which involves the saving of the result of previous calculations in a cache; hence, when same calculations are encountered again, the cached result is returned, thus re-computation is avoided. The cached result is considered a reduced cost compared to the computational cost of re-computation.

Full Text