Very Fast Decision Tree (VFDT) algorithm on Hadoop

Sharmishta Desai,Minal Kucheria,Sourav Roy,Brina Patel,Samruddhi Purandare

doi:10.1109/iccubea.2016.7860037

Abstract

In the era of Big Data where voluminous data is handled on a very large scale, traditional decision trees might be very time consuming and sometimes might even fail to work owing to its dataset size. Handling Big Data can also be a costly affair because of its high demand for memory and other hardware requirements. To the end of this paper, we have chosen a decision tree algorithm named Very Fast Decision Tree (VFDT) after comparing it with other decision tree algorithms like ID3 and C4.5. We have also proposed an algorithm for implementing VFDT on a Distributed Environment called Hadoop. This implementation can form a base for a large number of applications for handling Big Data. We have replaced the serial execution of VFDT algorithm by a series of Map and Reduce functions. We have also conducted an extensive analysis on various datasets which have proved our proposed algorithm to be more efficient in terms of time compared to the other existing decision tree models.

Full Text