Robust Distance Measures for kNN Classification of Cancer Data.

Rezvan Ehsani,Finn Drabløs

doi:10.1177/1176935120965542

Rezvan Ehsani, Finn Drabløs

Open Access

https://doi.org/10.1177/1176935120965542

Copy DOI

Abstract

The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a “guilt by association” principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.

Highlights

IntroductionThe k-nearest neighbor (kNN) approach was proposed by Fix and Hodges in 19511 and later modified by Cover and Hart in 1967.2 It is a simple, robust and versatile algorithm for classification and regression and has been used for different types of problems such as pattern recognition,[3] ranking of models,[4] text categorization,[5] and object recognition,[6] and in many different areas, including bioinformatics and medicine.[7,8,9] It is a non-parametric[10] and lazy learning classifier
Classification and pattern recognition are important challenges in data analysis
The k-nearest neighbor approach was proposed by Fix and Hodges in 19511 and later modified by Cover and Hart in 1967.2 It is a simple, robust and versatile algorithm for classification and regression and has been used for different types of problems such as pattern recognition,[3] ranking of models,[4] text categorization,[5] and object recognition,[6] and in many different areas, including bioinformatics and medicine.[7,8,9]

Summary

Introduction

The k-nearest neighbor (kNN) approach was proposed by Fix and Hodges in 19511 and later modified by Cover and Hart in 1967.2 It is a simple, robust and versatile algorithm for classification and regression and has been used for different types of problems such as pattern recognition,[3] ranking of models,[4] text categorization,[5] and object recognition,[6] and in many different areas, including bioinformatics and medicine.[7,8,9] It is a non-parametric[10] and lazy learning classifier. The kNN algorithm is conceptually simple but can still be used on complex biological data, for example, from cancer. The popularity of kNN seems to be increasing; the largest number of hits for both kNN itself and the combination of kNN and cancer is found in 2019, and for the combination of kNN and cancer more than 60% of the hits are found in 2014 or later

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Cancer Informatics	Publication Date: Jan 1, 2020
Citations: 42	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Robust Distance Measures for kNN Classification of Cancer Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cancer Informatics

Lead the way for us

Similar Papers

Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.
Haneen Arafat Abu Alfeilat ... Omar Lasassmeh
Big Data | VOL. 7
Haneen Arafat Abu Alfeilat, et. al.Haneen Arafat Abu Alfeilat ... Omar Lasassmeh
29 Sep 2019
Big Data | VOL. 7

Measures of Probabilistic Interval-Valued Intuitionistic Hesitant Fuzzy Sets and the Application in Reducing Excessive Medical Examinations
Yuling Zhai ... Huchang Liao
IEEE Transactions on Fuzzy Systems | VOL. 26
Yuling Zhai, et. al.Yuling Zhai ... Huchang Liao
01 Jun 2018
IEEE Transactions on Fuzzy Systems | VOL. 26

Construction and generation of distance and similarity measures for intuitionistic fuzzy sets and various applications
Brindaban Gohain ... Palash Dutta
International Journal of Intelligent Systems | VOL. 36
Brindaban Gohain, et. al.Brindaban Gohain ... Palash Dutta
19 Aug 2021
International Journal of Intelligent Systems | VOL. 36

Approaches to manage hesitant fuzzy linguistic information based on the cosine distance and similarity measures for HFLTSs and their application in qualitative decision making
Huchang Liao ... Zeshui Xu
Expert Systems with Applications | VOL. 42
Huchang Liao, et. al.Huchang Liao ... Zeshui Xu
28 Feb 2015
Expert Systems with Applications | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Distance Measures for kNN Classification of Cancer Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cancer Informatics