Abstract

Some supervised tasks are presented with a numerical output but decisions have to be made in a discrete, binarised, way, according to a particular cutoff. This binarised regression task is a very common situation that requires its own analysis, different from regression and classification--and ordinal regression. We first investigate the application cases in terms of the information about the distribution and range of the cutoffs and distinguish six possible scenarios, some of which are more common than others. Next, we study two basic approaches: the retraining approach, which discretises the training set whenever the cutoff is available and learns a new classifier from it, and the reframing approach, which learns a regression model and sets the cutoff when this is available during deployment. In order to assess the binarised regression task, we introduce context plots featuring error against cutoff. Two special cases are of interest, the $$ UCE $$UCE and $$ OCE $$OCE curves, showing that the area under the former is the mean absolute error and the latter is a new metric that is in between a ranking measure and a residual-based measure. A comprehensive evaluation of the retraining and reframing approaches is performed using a repository of binarised regression problems created on purpose, concluding that no method is clearly better than the other, except when the size of the training data is small.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.