Abstract

A Deep Boltzmann Machine is a model of a Deep Neural Network formed from multiple layers of neurons with nonlinear activation functions. The structure of a Deep Boltzmann Machine enables it to learn very complex relationships between features and facilitates advanced performance in learning of high-level representation of features, compared to conventional Artificial Neural Networks. Feature selection at the input level of Deep Neural Networks has not been well studied, despite its importance in reducing the input features processed by the deep learning model, which facilitates understanding of the data. This paper proposes a novel algorithm, Deep Feature Selection (Deep-FS), which is capable of removing irrelevant features from large datasets in order to reduce the number of inputs which are modelled during the learning process. The proposed Deep-FS algorithm utilizes a Deep Boltzmann Machine, and uses knowledge which is acquired during training to remove features at the beginning of the learning process. Reducing inputs is important because it prevents the network from learning the associations between the irrelevant features which negatively impact on the acquired knowledge of the network about the overall distribution of the data. The Deep-FS method embeds feature selection in a Restricted Boltzmann Machine which is used for training a Deep Boltzmann Machine. The generative property of the Restricted Boltzmann Machine is used to reconstruct eliminated features and calculate reconstructed errors, in order to evaluate the impact of eliminating features. The performance of the proposed approach was evaluated with experiments conducted using the MNIST, MIR-Flickr, GISETTE, MADELON and PANCAN datasets. The results revealed that the proposed Deep-FS method enables improved feature selection without loss of accuracy on the MIR-Flickr dataset, where Deep-FS reduced the number of input features by removing 775 features without reduction in performance. With regards to the MNIST dataset, Deep-FS reduced the number of input features by more than 45%; it reduced the network error from 0.97% to 0.90%, and also reduced processing and classification time by more than 5.5%. Additionally, when compared to classical feature selection methods, Deep-FS returned higher accuracy. The experimental results on GISETTE, MADELON and PANCAN showed that Deep-FS reduced 81%, 57% and 77% of the number of input features, respectively. Moreover, the proposed feature selection method reduced the classifier training time by 82%, 70% and 85% on GISETTE, MADELON and PANCAN datasets, respectively. Experiments with various datasets, comprising a large number of features and samples, revealed that the proposed Deep-FS algorithm overcomes the main limitations of classical feature selection algorithms. More specifically, most classical methods require, as a prerequisite, a pre-specified number of features to retain, however in Deep-FS this number is identified automatically. Deep-FS performs the feature selection task faster than classical feature selection algorithms which makes it suitable for deep learning tasks. In addition, Deep-FS is suitable for finding features in large and big datasets which are normally stored in data batches for faster and more efficient processing.

Highlights

  • In the second set of experiments, the performance of the proposed Deep-FS1 method, that was used for MNIST data in Section 4.1, is tested using the MIR-Flickr dataset obtained from the Flickr.com social photography site [7,47]

  • This paper proposes a novel feature selection algorithm, DeepFS, which is embedded into the Deep Boltzmann Machines (DBMs) classifier

  • The proposed Deep Feature Selection (Deep-Feature Selection (FS)) feature selection method works in conjunction with the feature extraction method of DBM to reduce the number of input features and learning errors

Read more

Summary

Introduction

Conventional machine learning methods have limited ability to process raw data, and for this reason considerable effort is traditionally placed on feature engineering. An important advantage of DL methods over conventional approaches (e.g. Artificial Neural Network, Support Vector Machine, Naïve Bayes), is that DL methods integrate the feature extraction and learning process into a single model, and feature engineering is dealt with as an integrated rather than a separate task. Eliminating irrelevant and redundant features, results in a permanent reduction in the dimensionality of the data, and this can increase the processing speed and accuracy of the utilized machine learning methods. Information processing starts by using an unsupervised learning method. In this stage, unlabelled data can be used. This paper is focused on DBMs that include unsupervised learning during their first stage of training procedure

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.