Abstract Feature selection is critical for effective analysis of data and resource savings. In multi-dimensional datasets, feature selection methods mainly use filter based approach to obtain an optimal feature subspace and wrapper methods to search for an optimal feature subset within this space. In the proposed study, two filter based statistical feature selection methods viz., statistical t -test ranking with principal component analysis (PCA) and Separability & Correlation (SEPCOR) analysis are applied to identify patterns with high discrimination between wake and stage 1 sleep of a 8-channel (6 active +2 reference electrodes) electroencephalogram (EEG) sleep dataset. The feature set consists of 6-dimensional Spectral Entropy vectors computed over EEG epochs of one second duration. In the first method, spectral entropy feature ranking is based on a t -test statistic that maximizes class separation between wakefulness/stage1 sleep. Prior to classification, PCA is performed on the ranked and non-ranked feature subsets to study the contribution of ranked channels on classifier performance. The second method uses SEPCOR analysis to automatically select an optimal feature subset with low correlation among the chosen features and maximum separation between their class means. A correlation threshold is chosen heuristically in steps of 0.05 from 0.6 to 0.75 in order to select different subsets of features. The optimal feature subsets are evaluated using multi layered perceptron (MLP) network & k-nearest neighbor (k-NN) classifiers with 50% hold out cross validation. For ranked feature subsets N = 3, 4, 5, k-NN classifier outperforms MLP network with an increase in the number of principal components (pcs). Results indicate that the pcs of ranked channels enhance the performance of k-NN classifier whereas MLP network shows only a marginal improvement with ranking for number of channels, N ≤ 4. As the number of pcs is varied from 2 to 4 in steps of one, there is an improvement of approximately 2% in the classification accuracies of k-NN classifier with ranking as compared to their non-ranked counterparts. The MLP exhibits only 1% improvement with ranking for the same case with number of hidden neurons, N = 50. The k-NN classifier responds with maximum accuracies of 96.43%, 95.7% and 94.10% (pc = 4, 3, 2 for no. of ranked channels, N = 4) as compared to 94.71%, 93.13% and 92% (pc = 4, 3, 2 non-ranked N = 4) respectively. The SEPCOR results show that with correlation threshold increasing from 0.6 to 0.75 in steps of 0.05, it automatically selects feature subsets of 2, 3, 4 and 5 which contribute to detection accuracies of 72.4%, 80%, 91.6% and 92% with k-NN classifier and improved accuracies of 73%, 85%, 95.6% and 95.8% with MLP network (no. of hidden neurons, N = 50) respectively. The SE feature ranking provides better classification results using k-NN classifier than non-ranked cases whereas features obtained using SEPCOR analysis prove to be better discriminators with MLP network for the classification of wake/stage1 sleep data. The computation speed is faster in k-NN classifier and independent of increase in value of k whereas MLP takes much more computation time for training based on the number of hidden neurons.
Read full abstract