Abstract
We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997---2016, we find 15 distinct cost-sensitive variants of the original algorithm; each of these has its own motivation and claims to superiority--so who should we believe? In this work we critique the Boosting literature using four theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. Our finding is that only three algorithms are fully supported--and the probabilistic model view suggests that all require their outputs to be calibrated for best performance. Experiments on 18 datasets across 21 degrees of imbalance support the hypothesis--showing that once calibrated, they perform equivalently, and outperform all others. Our final recommendation--based on simplicity, flexibility and performance--is to use the original Adaboost algorithm with a shifted decision threshold and calibrated probability estimates.
Highlights
Cost-sensitive prediction tasks are everywhere in real life applications—e.g. medical applications where false positives are dangerous, or rare classes in astrophysical data where a false negative can mean missing a key scientific observation
At the time of writing this article, we identify 15 distinct variants proposed in a sequence of papers (Landesa-Vázquez and Alba-Castro 2012, 2013; Masnadi-Shirazi and Vasconcelos 2007, 2011; Sun et al 2005, 2007; Viola and Jones 2002; Ting 2000; Fan et al 1999) published 1997–2016
Suppose that the false positive cost cF P associated with misclassifying negatives is twice the false negative cost cF N
Summary
Cost-sensitive prediction tasks are everywhere in real life applications—e.g. medical applications where false positives are dangerous, or rare classes in astrophysical data where a false negative can mean missing a key scientific observation. The Adaboost algorithm (Freund and Schapire 1997) stands out in the field of ensemble learning— named in a community survey as one of the top ten algorithms in data mining (Wu et al 2008), whilst having a rich theoretical depth, winning the 2003 Gödel prize for the authors It is no surprise that significant international research effort has been dedicated to adapting Adaboost for cost sensitive tasks. Suppose that the false positive cost cF P associated with misclassifying negatives is twice the false negative cost cF N This can be simulated by an adjusted class prior which duplicates every negative, leading to π−
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.