High Number Of Features Research Articles

Biomarker discovery exploiting feature importance of machine learning has risen recently in the microbiome landscape with its high predictive performance in several disease states. To have a concrete selection among a high number of features, recursive feature elimination (RFE) has been widely used in the bioinformatics field. However, machine learning-based RFE has factors that decrease the stability of feature selection. In this article, we suggested methods to improve stability while sustaining performance. We exploited the abundance matrices of the gut microbiome (283 taxa at species level and 220 at genus level) to classify between patients with inflammatory bowel disease (IBD) and healthy control (1,569 samples). We found that applying an already published data transformation before RFE improves feature stability significantly. Moreover, we performed an in-depth evaluation of different variants of the data transformation and identify those that demonstrate better improvement in stability while not sacrificing classification performance. To ensure a robust comparison, we evaluated stability using various similarity metrics, distances, the common number of features, and the ability to filter out noise features. We were able to confirm that the mapping by the Bray-Curtis similarity matrix before RFE consistently improves the stability while maintaining good performance. Multilayer perceptron algorithm exhibited the highest performance among 8 different machine learning algorithms when a large number of features (a few hundred) were considered based on the best performance across 100 bootstrapped internal test sets. Conversely, when utilizing only a limited number of biomarkers as a trade-off between optimal performance and method generalizability, the random forest algorithm demonstrated the best performance. Using the optimal pipeline we developed, we identified 14 biomarkers for IBD at the species level and analyzed their roles using Shapley additive explanations. Taken together, our work not only showed how to improve biomarker discovery in the metataxonomic field without sacrificing classification performance but also provided useful insights for future comparative studies.

Read full abstract

Alzheimer's disease (AD) is a degenerative disorder that attacks nerve cells in the brain. AD leads to memory loss and cognitive & intellectual impairments that can influence social activities and decision-making. The most common type of human genetic variation is single nucleotide polymorphisms (SNPs). SNPs are beneficial markers of complex gene-disease. Many common and serious diseases, such as AD, have associated SNPs. Detection of SNP biomarkers linked with AD could help in the early prediction and diagnosis of this disease. The main objective of this paper is to predict and diagnose AD based on SNPs biomarkers with high classification accuracy in the early stages. One of the most concerning problems is the high number of features. Thus, the paper proposes a comprehensive framework for early AD detection and detecting the most significant genes based on SNPs analysis. Usage of machine learning (ML) techniques to identify new biomarkers of AD is also suggested. In the proposed system, two feature selection techniques are separately checked: the information gain filter and Boruta wrapper. The two feature selection techniques were used to select the most significant genes related to AD in this system. Filter methods measure the relevance of features by their correlation with dependent variables, while wrapper methods measure the usefulness of a subset of features by training a model on it. Gradient boosting tree (GBT) has been applied on all AD genetic data of neuroimaging initiative phase 1 (ADNI-1) and Whole-Genome Sequencing (WGS) datasets by using two feature selection techniques. In the whole-genome approach ADNI-1, results revealed that the GBT learning algorithm scored an overall accuracy of 99.06% in the case of using Boruta feature selection. Using information gain feature selection, the proposed system achieved an average accuracy of 94.87%. The results show that the proposed system is preferable for the early detection of AD. Also, the results revealed that the Boruta wrapper feature selection is superior to the information gain filter technique.

Read full abstract

High Number Of Features Research Articles

Articles published on High Number Of Features

Characterizing the Contribution of Dependent Features in XAI Methods.

Artificial Intelligence Workload Allocation Method for Vehicular Edge Computing

Analysis of Variance Combined with Optimized Gradient Boosting Machines for Enhanced Load Recognition in Home Energy Management Systems.

NEW STRATEGIES FOR IMPROVING NETWORK SECURITY AGAINST CYBER ATTACK BASED ON INTELLIGENT ALGORITHMS

A Multi-Model Framework to Explore ADHD Diagnosis from Neuroimaging Data

Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets

An integrated clustering and BERT framework for improved topic modeling.

A gradient-based approach for adversarial attack on deep learning-based network intrusion detection systems

Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease.

Kurtosis-Based Feature Selection Method using Symmetric Uncertainty to Predict the Air Quality Index

Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree

Various Soft Computing Based Techniques for Developing Intrusion Detection Management System

Development and Application of an LC-MS/MS Untargeted Exposomics Method with a Separated Pooled Quality Control Strategy.

Assessing Feature Importance for Short-Term Prediction of Electricity Demand in Medium-Voltage Loads

Incremental Learning Framework for Mining Big Data Stream

High Performance Classification of Cancer Types with Gene Microarray Datasets: Hybrid Approach

Audio based depression detection using Convolutional Autoencoder

Disease Single Nucleotide Polymorphism Selection using Hybrid Feature Selection Technique

FGAAM: A fast and resizable genetic algorithm with aggressive mutation for feature selection

Comparative Analysis of Intrusion Detection Attack Based on Machine Learning Classifiers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High Number Of Features Research Articles

Articles published on High Number Of Features

Characterizing the Contribution of Dependent Features in XAI Methods.

Artificial Intelligence Workload Allocation Method for Vehicular Edge Computing

Analysis of Variance Combined with Optimized Gradient Boosting Machines for Enhanced Load Recognition in Home Energy Management Systems.

NEW STRATEGIES FOR IMPROVING NETWORK SECURITY AGAINST CYBER ATTACK BASED ON INTELLIGENT ALGORITHMS

A Multi-Model Framework to Explore ADHD Diagnosis from Neuroimaging Data

Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets

An integrated clustering and BERT framework for improved topic modeling.

A gradient-based approach for adversarial attack on deep learning-based network intrusion detection systems

Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease.

Kurtosis-Based Feature Selection Method using Symmetric Uncertainty to Predict the Air Quality Index

Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree

Various Soft Computing Based Techniques for Developing Intrusion Detection Management System

Development and Application of an LC-MS/MS Untargeted Exposomics Method with a Separated Pooled Quality Control Strategy.

Assessing Feature Importance for Short-Term Prediction of Electricity Demand in Medium-Voltage Loads

Incremental Learning Framework for Mining Big Data Stream

High Performance Classification of Cancer Types with Gene Microarray Datasets: Hybrid Approach

Audio based depression detection using Convolutional Autoencoder

Disease Single Nucleotide Polymorphism Selection using Hybrid Feature Selection Technique

FGAAM: A fast and resizable genetic algorithm with aggressive mutation for feature selection

Comparative Analysis of Intrusion Detection Attack Based on Machine Learning Classifiers