Abstract

Nearest Shrunken Centroid (NSC) classification has proven successful in ultra-high-dimensional classification problems involving thousands of features measured on relatively few individuals, such as in the analysis of DNA microarrays. The method requires the set of candidate classes to be closed. However, open-set classification is essential in many other applications including speaker identification, facial recognition, and authorship attribution. The authors review closed-set NSC classification, and then propose a diagnostic for whether open-set classification is needed. The diagnostic involves graphical and statistical comparison of posterior predictions of the test vectors to the observed test vectors. The authors propose a simple modification to NSC that allows the set of classes to be open. The open-set modification posits an unobserved class with a distribution of features just barely consistent with the test sample. A tuning constant reflects the combined considerations of power, specificity, multiplicity, number of features, and sample size. The authors illustrate and investigate properties of the diagnostic test and open-set NSC classification procedure using several example data sets. The diagnostic and the open-set NSC procedures are shown to be useful for identifying vectors that are not consistent with any of the candidate classes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.