Feature Selection Task Research Articles

AbstractFeature selection and hyper-parameters optimization (tuning) are two of the most important and challenging tasks in machine learning. To achieve satisfying performance, every machine learning model has to be adjusted for a specific problem, as the efficient universal approach does not exist. In addition, most of the data sets contain irrelevant and redundant features that can even have a negative influence on the model’s performance. Machine learning can be applied almost everywhere; however, due to the high risks involved with the growing number of malicious, phishing websites on the world wide web, feature selection and tuning are in this research addressed for this particular problem. Notwithstanding that many metaheuristics have been devised for both feature selection and machine learning tuning challenges, there is still much space for improvements. Therefore, the research exhibited in this manuscript tries to improve phishing website detection by tuning extreme learning model that utilizes the most relevant subset of phishing websites data sets features. To accomplish this goal, a novel diversity-oriented social network search algorithm has been developed and incorporated into a two-level cooperative framework. The proposed algorithm has been compared to six other cutting-edge metaheuristics algorithms, that were also implemented in the framework and tested under the same experimental conditions. All metaheuristics have been employed in level 1 of the devised framework to perform the feature selection task. The best-obtained subset of features has then been used as the input to the framework level 2, where all algorithms perform tuning of extreme learning machine. Tuning is referring to the number of neurons in the hidden layers and weights and biases initialization. For evaluation purposes, three phishing websites data sets of different sizes and the number of classes, retrieved from UCI and Kaggle repositories, were employed and all methods are compared in terms of classification error, separately for layers 1 and 2 over several independent runs, and detailed metrics of the final outcomes (output of layer 2), including precision, recall, f1 score, receiver operating characteristics and precision–recall area under the curves. Furthermore, an additional experiment is also conducted, where only layer 2 of the proposed framework is used, to establish metaheuristics performance for extreme machine learning tuning with all features, which represents a large-scale NP-hard global optimization challenge. Finally, according to the results of statistical tests, final research findings suggest that the proposed diversity-oriented social network search metaheuristics on average obtains better achievements than competitors for both challenges and all data sets. Finally, the SHapley Additive exPlanations analysis of the best-performing model was applied to determine the most influential features.

AbstractThe speedy development of intelligent technologies and gadgets has led to a drastic increment of dimensions within the datasets in recent years. Dimension reduction algorithms, such as feature selection methods, are crucial to resolving this obstacle. Currently, metaheuristic algorithms have been extensively used in feature selection tasks due to their acceptable computational cost and performance. In this article, a binary-modified version of aphid–ant mutualism (AAM) called binary aphid–ant mutualism (BAAM) is introduced to solve the feature selection problems. Like AAM, in BAAM, the intensification and diversification mechanisms are modeled via the intercommunication of aphids with other colonies’ members, including aphids and ants. However, unlike AAM, the number of colonies’ members can change in each iteration based on the attraction power of their leaders. Moreover, the second- and third-best individuals can take the place of the ringleader and lead the pioneer colony. Also, to maintain the population diversity, prevent premature convergence, and facilitate information sharing between individuals of colonies including aphids and ants, a random cross-over operator is utilized in BAAM. The proposed BAAM is compared with five other feature selection algorithms using several evaluation metrics. Twelve medical and nine non-medical benchmark datasets with different numbers of features, instances, and classes from the University of California, Irvine and Arizona State University repositories are considered for all the experiments. Moreover, a coronavirus disease (COVID-19) dataset is used to validate the effectiveness of the BAAM in real-world applications. Based on the acquired outcomes, the proposed BAAM outperformed other comparative methods in terms of classification accuracy using various classifiers, including K nearest neighbor, kernel-based extreme learning machine, and multi-class support vector machine, choosing the most informative features, the best and mean fitness values and convergence speed in most cases. As an instance, in the COVID-19 dataset, BAAM achieved 96.53% average accuracy and selected the most informative feature subset.

Feature Selection Task Research Articles

Related Topics

Articles published on Feature Selection Task

Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection

GBDTMO: as new option for early-stage breast cancer detection and classification using machine learning

Practical Markov Boundary Learning without Strong Assumptions

Boosted local dimensional mutation and all-dimensional neighborhood slime mould algorithm for feature selection

Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets

Multi-Target Markov Boundary Discovery: Theory, Algorithm, and Application.

Differential Evolution-Based Feature Selection: A Niching-Based Multiobjective Approach

Incremental feature selection with fuzzy rough sets for dynamic data sets

Feature clustering-Assisted feature selection with differential evolution

A Tensor Method based on Enhanced Tensor Nuclear Norm and Hypergraph Laplacian Regularization for Pan-Cancer Omics Data Analysis.

UDRN: Unified Dimensional Reduction Neural Network for feature selection and feature projection

A novel firefly algorithm approach for efficient feature selection with COVID-19 dataset

Disambiguation-based partial label feature selection via feature dependency and label consistency

A modified binary version of aphid–ant mutualism for feature selection: a COVID-19 case study

DroidRL: Feature selection for android malware detection with reinforcement learning

A Tent Lévy Flying Sparrow Search Algorithm for Wrapper-Based Feature Selection: A COVID-19 Case Study

Unsupervised Adaptive Feature Selection with Binary Hashing.

A filter feature selection for high-dimensional data

Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning

Crow search algorithm with time varying flight length Strategies for feature selection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Feature Selection Task Research Articles

Related Topics

Articles published on Feature Selection Task

Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection

GBDTMO: as new option for early-stage breast cancer detection and classification using machine learning

Practical Markov Boundary Learning without Strong Assumptions

Boosted local dimensional mutation and all-dimensional neighborhood slime mould algorithm for feature selection

Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets

Multi-Target Markov Boundary Discovery: Theory, Algorithm, and Application.

Differential Evolution-Based Feature Selection: A Niching-Based Multiobjective Approach

Incremental feature selection with fuzzy rough sets for dynamic data sets

Feature clustering-Assisted feature selection with differential evolution

A Tensor Method based on Enhanced Tensor Nuclear Norm and Hypergraph Laplacian Regularization for Pan-Cancer Omics Data Analysis.

UDRN: Unified Dimensional Reduction Neural Network for feature selection and feature projection

A novel firefly algorithm approach for efficient feature selection with COVID-19 dataset

Disambiguation-based partial label feature selection via feature dependency and label consistency

A modified binary version of aphid–ant mutualism for feature selection: a COVID-19 case study

DroidRL: Feature selection for android malware detection with reinforcement learning

A Tent Lévy Flying Sparrow Search Algorithm for Wrapper-Based Feature Selection: A COVID-19 Case Study

Unsupervised Adaptive Feature Selection with Binary Hashing.

A filter feature selection for high-dimensional data

Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning

Crow search algorithm with time varying flight length Strategies for feature selection