Abstract

The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper extends two well-known decision tree methods for centralized data to distributed data settings. The first method is an extension of CHAID algorithm and generates single feature based multi-way split decision trees. The second method is based on Fisher's linear discriminant (FLO) function and generates multifeature binary trees. Both methods aim to generate compact trees and are able to handle multiple classes. The suggested extensions for distributed environment are compared to their centralized counterparts and also to each other. Theoretical analysis and experimental tests demonstrate the effectiveness of the extensions. In addition, the side-by-side comparison highlights the advantages and deficiencies of these methods under different settings of the distribution environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call