Instance-Based Classification Through Hypothesis Testing

Zengyou He,Quan Zou,Yan Liu,Chaohua Sheng

doi:10.1109/access.2021.3053778

Abstract

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address the issue of statistical significance. In this paper, we formulate the binary classification problem as a two-sample testing problem. More precisely, our classification model is a generic framework that is composed of two steps. In the first step, the distance between the test instance and each training instance is calculated to derive two distance sets. In the second step, the two-sample test is performed under the null hypothesis that the two sets of distances are drawn from the same cumulative distribution. After these two steps, we have two ${p}$ -values for each test instance and the test instance is assigned to the class associated with the smaller ${p}$ -value. Essentially, the presented classification method can be regarded as an instance-based classifier based on hypothesis testing. The experimental results on 38 real data sets show that our method is able to achieve the same level performance as several classic classifiers and has significantly better performance than existing testing-based classifiers. Furthermore, we can handle outlying instances and control the false discovery rate of test instances assigned to each class under the same framework.

Highlights

Classification is a fundamental data analysis procedure, which is ubiquitously used across different fields
Based on the above observations, we present a new testingbased classification formulation, in which the null hypothesis is that, informally, the test instance does not belong to any class
The testing-based classification model has the advantage of controlling the false discovery rate (FDR) of classified test instances and handling outlying instances under the same framework

Summary

INTRODUCTION

Classification is a fundamental data analysis procedure, which is ubiquitously used across different fields. Based on the above observations, we present a new testingbased classification formulation, in which the null hypothesis is that, informally, the test instance does not belong to any class. It is very easy to control the type I error in terms of FDR in our formulation since the p-values of each test instance with respect to different classes will be generated in the classification phase. In other words, such testing-based classification formulation provides a unified framework to control the asymmetric classification error in a natural way. We can assign the test instance to the class that has the smallest p-value

K-NN VARIANTS

THE CHOICE OF TESTING METHODS

HANDLING OUTLIERS AND FDR CONTROL

RELATIONSHIP TO OTHER APPROACHES

CONNECTION TO NEAREST CENTROID CLASSIFIER

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 22	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Instance-Based Classification Through Hypothesis Testing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Hypothesis Testing
Shane Allua ... Cheryl Bagley Thompson
Air Medical Journal | VOL. 28
Shane Allua, et. al.Shane Allua ... Cheryl Bagley Thompson
01 May 2009
Air Medical Journal | VOL. 28

Hypothesis tests
J Walker
BJA Education | VOL. 19
J WalkerJ Walker
14 May 2019
BJA Education | VOL. 19

The Art of the Null Hypothesis—Considerations for Study Design and Scientific Reporting
Christian T O'Donnell ... Matthew W Vanneman
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37
Christian T O'Donnell, et. al.Christian T O'Donnell ... Matthew W Vanneman
22 Feb 2023
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37

A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction
Zhen Gao ... Jianhua Ruan
-
Zhen Gao, et. al.Zhen Gao ... Jianhua Ruan
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Instance-Based Classification Through Hypothesis Testing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access