Abstract

This paper presents a novel approach for efficient feature extraction using mutual information (MI). In terms of mutual information, the optimal feature extraction is creating a feature set from the data which jointly have the largest dependency on the target class. However, it is not always easy to get an accurate estimation for high-dimensional MI. In this paper, we propose an efficient method for feature extraction which is based on two-dimensional MI estimates. At each step, a new feature is created that attempts to maximize the MI between the new feature and the target class and to minimize the redundancy. We will refer to this algorithm as Minimax-MIFX. The effectiveness of the method is evaluated by using the classification of electroencephalogram (EEG) signals during hand movement imagination. The results confirm that the classification accuracy obtained by Minimax-MIFX is higher than that achieved by existing feature extraction methods and by full feature set.

Highlights

  • Classification of the EEG signals associated with mental tasks plays an important role in the performance of the most EEG-based brain-computer interface (BCI) and reducing the dimensionality of the raw input variable space is an essential preprocessing step in the classification process

  • To overcome the abovementioned practical obstacle, we propose a heuristic method for feature extraction which is based on minimal-redundancy-maximal-relevance framework

  • The results indicate that classification accuracy obtained by the Minimax-mutual information-based feature extraction (MIFX) method is generally better than that obtained by other methods

Read more

Summary

Introduction

Classification of the EEG signals associated with mental tasks plays an important role in the performance of the most EEG-based brain-computer interface (BCI) and reducing the dimensionality of the raw input variable space is an essential preprocessing step in the classification process. It has been observed that added irrelevant features may degrade the performance of classifiers if the number of training samples is small relative to the number of features [1]. These problems can be avoided by selecting relevant features (i.e., feature selection) or extracting new features containing maximal information about the class label from the original ones (i.e., feature extraction). The purpose of PCA is to find an orthogonal set of projection vectors or principal components for feature extraction from given training data through maximizing the variance of the projected data with aim of optimally representing the data in terms of minimal reconstruction error. In its feature extraction for classification tasks, PCA does not sufficiently use class information associated with patterns and its maximization to the variance of the projected patterns might not necessarily be in favor of discrimination among classes, naturally it likely loses some useful discriminating information for classification

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call