Learning-based text classifiers using the Mahalanobis distance for correlated datasets

Noopur Srivastava,Shrisha Rao

doi:10.1504/ijbdi.2016.073901

Abstract

We present a novel approach to text categorisation with the aid of the Mahalanobis distance measure for classification. For correlated datasets, classification using the Euclidean distance is not very accurate. The use of the Mahalanobis distance exploits the correlation in data for the purpose of classification. For achieving this on large datasets, an unsupervised dimensionality reduction technique, principal component analysis (PCA) is used prior to classification using the k-nearest neighbours (kNN) classifier. As kNN does not work well for high-dimensional data, and moreover computing correlations for huge and sparse data is inefficient, we use PCA to obtain a reduced dataset for the training phase. Experimental results show improvement in classification accuracy and a significant reduction in error percentage by using the proposed algorithm on huge datasets, in comparison with classifiers using the Euclidean distance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning-based text classifiers using the Mahalanobis distance for correlated datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Big Data Intelligence

Lead the way for us

Journal: International Journal of Big Data Intelligence	Publication Date: Jan 1, 2016
Citations: 8

Similar Papers

Evaluation of effect of unsupervised dimensionality reduction techniques on automated arrhythmia classification
Rekha Rajagopal ... Vidhyapriya Ranganathan
Biomedical Signal Processing and Control | VOL. 34
Rekha Rajagopal, et. al.Rekha Rajagopal ... Vidhyapriya Ranganathan
10 Jan 2017
Biomedical Signal Processing and Control | VOL. 34

Hyperspectral Imaging for Differentiating Colonies of Non-0157 Shiga-Toxin Producing Escherichia Coli (STEC) Serogroups on Spread Plates of Pure Cultures
Seung-Chul Yoon ... Neelam Narang
Journal of Near Infrared Spectroscopy | VOL. 21
Seung-Chul Yoon, et. al.Seung-Chul Yoon ... Neelam Narang
01 Jan 2013
Journal of Near Infrared Spectroscopy | VOL. 21

Distance Metric Learning for Large Margin Nearest Neighbor Classification
...
Journal of Machine Learning Research | VOL. 10
, et. al. ...
01 Dec 2009
Journal of Machine Learning Research | VOL. 10

Decision Support System for Classification of Early Childhood Diseases Using Principal Component Analysis and K-Nearest Neighbors Classifier
Damar Dananjaya ... Rini Semiati
Journal of Information Systems Engineering and Business Intelligence | VOL. 5
Damar Dananjaya, et. al.Damar Dananjaya ... Rini Semiati
25 Apr 2019
Journal of Information Systems Engineering and Business Intelligence | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning-based text classifiers using the Mahalanobis distance for correlated datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Big Data Intelligence