Distance-based features in pattern classification

Chih-Fong Tsai,Wei-Yang Lin,Zhen-Fu Hong,Chung-Yang Hsieh

doi:10.1186/1687-6180-2011-62

Abstract

Abstract In data mining and pattern classification, feature extraction and representation methods are a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this article, we introduce a novel distance-based feature extraction method for various pattern classification problems. Specifically, two distances are extracted, which are based on (1) the distance between the data and its intra-cluster center and (2) the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance-based features can improve classification accuracy except image-related datasets. In particular, the distance-based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance-based features can improve the classification performance.

Highlights

Data mining has received unprecedented focus in the recent years
The novel distance-based features proposed in this article are examined over a number of different pattern classification problems and the distancebased features and the original features are concatenated for another new feature representation for classification
Since feature extraction and representation have a direct and significant impact on the classification performance, we introduce novel distance-based features to improve classification accuracy over various domain datasets

Summary

Introduction

Data mining has received unprecedented focus in the recent years. It can be utilized in analyzing a huge amount of data and finding valuable information. Pattern classification is an important research topic in the fields of data mining and machine learning. It focuses on constructing a model so that the input data can be assigned to the correct category. In this article, we introduce novel distance-based features to improve classification accuracy. The distance between a specific data and its nearest centroid and other distances between the data and other centroids should be able to provide valuable information for classification. This rest of the article is organized as follows. The PCA algorithm can be summarized in the following steps:

Literature review

Accuracy

Sample size

Support vector machines

Distances from extra-cluster center

Experiments

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Sep 18, 2011
Citations: 32	License type: CC BY 2.0

R Discovery Prime

Distance-based features in pattern classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Investigating the contribution of distance-based features to automatic sleep stage classification
Ali Abdollahi Gharbali ... José Manuel Fonseca
Computers in Biology and Medicine | VOL. 96
Ali Abdollahi Gharbali, et. al.Ali Abdollahi Gharbali ... José Manuel Fonseca
07 Mar 2018
Computers in Biology and Medicine | VOL. 96

Distance-based feature extraction for biometric recognition of Millimeter Wave body images
Miriam Moreno-Moreno ... Josep Parron
-
Miriam Moreno-Moreno, et. al.Miriam Moreno-Moreno ... Josep Parron
01 Oct 2011
01 Oct 2011

A Feature Representation and Extraction Method for Malicious Code Detection Based on LZW Compression Algorithm
Yingxu Lai ... Hongnan Liu
-
Yingxu Lai, et. al.Yingxu Lai ... Hongnan Liu
01 Jan 2013
01 Jan 2013

Human daily activity recognition with joints plus body features representation using Kinect sensor
Ahmad Jalal ... Adnan Farooq
-
Ahmad Jalal, et. al.Ahmad Jalal ... Adnan Farooq
01 Jun 2015
01 Jun 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Distance-based features in pattern classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing