Abstract

In decision tree building, the choice of the splitting criteria highly affects the quality of model that is developed. In this paper, decision tree models are developed on dispersed data using entropy measure and twoing criterion as the splitting criteria. Dispersed data in this sense has multiple independent local tables on which decision tree models are built. Prediction vectors are generated based on the local models and a final prediction is made from aggregation using majority voting. In effort to improve model quality ensemble method technique (bagging) is applied to build multiple models for each local table. The main purpose of this paper is to make a comparative study on the classification quality of decision tree models built on dispersed data using entropy and twoing splitting measure. The main observation is that when knowledge is highly dispersed in a lot of local tables, using twoing criterion in building decision tree models is better than using entropy measure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.