A depth-based nearest neighbor algorithm for high-dimensional data classification

Sandhya Harikumar,Ramachandra Kaimal,Akhil Aravindakshan Savithri

doi:10.3906/elk-1807-163

Sandhya Harikumar, Ramachandra Kaimal + Show 1 more

Open Access

https://doi.org/10.3906/elk-1807-163

Copy DOI

Abstract

Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN.

Highlights

The k-nearest neighbors algorithm is a simple nonparametric classification technique which is efficient, provided it is given a good distance metric and has enough labeled training data [1]
We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data, ii) extracting relevant features, iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function
K is the total number of clusters, tAj i is Ai dimensional instance assigned to Si and mi is the medoid of cluster Si . au,i denotes value of uth attribute of mi

Summary

Introduction

The k-nearest neighbors (kNN) algorithm is a simple nonparametric classification technique which is efficient, provided it is given a good distance metric and has enough labeled training data [1]. If we have no knowledge of the underlying distribution, a decision to classify data point x into a class y depends on a set of known samples (x1, y1), ..., (xn, yn). Available data points may not be labelled fully. The existing techniques for classification of such datasets are either not scalable or not accurate due to failure in identifying intricate relationships existing amongst the features of data. Traditional algorithms suitable for low-dimensional data, need to be improvised to suit the existing scenarios

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES	Publication Date: Nov 26, 2019
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

A depth-based nearest neighbor algorithm for high-dimensional data classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Lead the way for us

Similar Papers

Analysis and Comparison of Prediction of Heart Disease Using Novel K Nearest Neighbor and Decision Tree Algorithm
G Pavithraa ... S Sivaprasad
CARDIOMETRY | VOL. -
G Pavithraa, et. al.G Pavithraa ... S Sivaprasad
14 Feb 2023
CARDIOMETRY | VOL. -

Prediction Analysis of Novel Random Forest Algorithm and K Nearest Neighbor Algorithm in Heart Disease Prediction with an Improved Accuracy Rate
R Mahaveerakannan ... T Poojitha
CARDIOMETRY | VOL. -
R Mahaveerakannan, et. al.R Mahaveerakannan ... T Poojitha
14 Feb 2023
CARDIOMETRY | VOL. -

An Effective Approach to Detect Liver Disorder using KNN Algorithm in Comparison with Decision Tree Algorithm to Measure Accuracy
M.M Zaheer ... P Nirmala
CARDIOMETRY | VOL. -
M.M Zaheer, et. al.M.M Zaheer ... P Nirmala
14 Feb 2023
CARDIOMETRY | VOL. -

Evolutionary k-nearest neighbor imputation algorithm for gene expression data
Hiroshi De Silva ... A Shehan Perera
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 10
Hiroshi De Silva, et. al.Hiroshi De Silva ... A Shehan Perera
04 May 2017
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A depth-based nearest neighbor algorithm for high-dimensional data classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING &amp; COMPUTER SCIENCES

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES