Abstract

We explore the relationship between diversity measures and ensemble performance, for binary classification with simple majority voting, within a problem domain characterized by asymmetric misclassification costs. Extending the work of Kuncheva and Whitaker [Machine Learning 51(2) (2003) 181], we compare a set of diversity measures within two different data representations. The first is a direct representation, which explicitly allows for consideration of asymmetric costs by indicating the specific values of the predictions––which in turn allows for a distinction between more costly misclassifications in this domain (i.e., actual 0 predicted as 1) and less costly ones (i.e., actual 1 predicted as 0). The second is an oracle representation, which indicates predictions as either correct or incorrect, and therefore does not allow for asymmetric costs. Within these representations we identified and manipulated certain situational factors, including the percentage of target group members in the population and the designed accuracy and sensitivity of each constituent model. Based on a neural network comparison of diversity measures and ensemble performance, we found that (1) diversity measure association with ensemble performance is contingent on the data representation, with Yule's Q -statistic and the coincident failure measure (CFD) as the best indicators in the direct representation and CFD alone as best indicator in the oracle representation, and (2) diversity measure association with ensemble performance varies as situational factors are manipulated; that is, diversity measures are differentially effective at different factor levels. Thus, the choice of a diversity measure in assessing ensemble classification performance requires an examination of both the nature of the task domain and the specific factors that comprise the domain.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.