Chi-Square Test Based Decision Trees Induction in Distributed Environment

Jie Ouyang,Nilesh Patel,Ishwar K Sethi

doi:10.1109/icdmw.2008.37

Abstract

The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one distributed learning algorithm which extends the original(centralized) CHAID algorithm to its distributed version. This distributed algorithm generates exactly the same results as its centralized counterpart. For completeness, a distributed quantization method is proposed so that continuous data can be processed by our algorithm. Experimental results for several well known data sets are presented and compared with decision trees generated using CHAID with centrally stored data.

Full Text