A survey on Neyman‐Pearson classification and suggestions for future research

Xin Tong,Yang Feng,Anqi Zhao

doi:10.1002/wics.1376

Abstract

In statistics and machine learning, classification studies how to automatically learn to make good qualitative predictions (i.e., assign class labels) based on past observations. Examples of classification problems include email spam filtering, fraud detection, market segmentation. Binary classification, in which the potential class label is binary, has arguably the most widely used machine learning applications. Most existing binary classification methods target on the minimization of the overall classification risk and may fail to serve some real‐world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman‐Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers with a minimal type II error subject to a type I error constraint under some user‐specified level. Though NP classification has the potential to be an important subfield in the classification literature, it has not received much attention in the statistics and machine learning communities. This article is a survey on the current status of the NP classification literature. To stimulate readers' research interests, the authors also envision a few possible directions for future research in NP paradigm and its applications. WIREs Comput Stat 2016, 8:64–81. doi: 10.1002/wics.1376This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Full Text