Abstract

Let (X, Y) be a random variable consisting of an observed feature vector X and an unobserved class label Y ∈ {1, 2, . . . , L} with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X, Y). Instead of providing point predictors (classifiers) for Y , we compute for each b ∈ {1, 2, . . . , L} a p value π_b (X, D) for the null hypothesis that Y = b, treating Y temporarily as a fixed parameter, i.e., we construct a prediction region for Y with a certain confidence. The advantages of this approach over more traditional ones are reviewed briefly. In principle, any reasonable classifier can be modified to yield nonparametric p values. We describe the R package pvclass which computes nonparametric p values for the potential class memberships of new observations as well as cross-validated p values for the training data. Additionally, it provides graphical displays and quantitative analyses of the p values.

Highlights

  • Let (X, Y ) be a pair of random variables, consisting of an observed feature vector X with values in a feature space X and an unobserved class label Y ∈ Y := {1, 2, . . . , L} with L ≥ 2 possible values

  • In the sequel we provide a brief introduction to the particular paradigm of p values as introduced by Dümbgen, Igl, and Munk (2008)

  • It is closely related to Neyman-Pearson classification, see Scott (2007), Zhao, Feng, Wang, and Tong (2015) and the references cited therein

Read more

Summary

Introduction

Let (X, Y ) be a pair of random variables, consisting of an observed feature vector X with values in a feature space X and an unobserved class label Y ∈ Y := {1, 2, . . . , L} with L ≥ 2 possible values. Let (X, Y ) be a pair of random variables, consisting of an observed feature vector X with values in a feature space X and an unobserved class label Y ∈ Y := {1, 2, . Our aim is inference about Y with a given confidence, based on X and certain training data. In the sequel we provide a brief introduction to the particular paradigm of p values as introduced by Dümbgen, Igl, and Munk (2008). It is closely related to Neyman-Pearson classification, see Scott (2007), Zhao, Feng, Wang, and Tong (2015) and the references cited therein

From classifiers to p values
Example
Optimal p values as benchmark
Training data and nonparametric p values
Cross-validated p values and ROC functions
Data example buerk
Choices of test statistics
Plug-in estimator for standard model
Nearest neighbors and weighted nearest neighbors
Penalized multicategory logistic regression
Implementation and main functions
Classify new observations
Cross-validated p values
Choice of tuning parameters
Numerical examples
Findings
Relation to other classifiers and packages
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call