Huge Number Of Features Research Articles

BackgroundAmyotrophic lateral sclerosis (ALS) is a rare progressive neurodegenerative disease that affects upper and lower motor neurons. As the molecular basis of the disease is still elusive, the development of high-throughput sequencing technologies, combined with data mining techniques and machine learning methods, could provide remarkable results in identifying pathogenetic mechanisms. High dimensionality is a major problem when applying machine learning techniques in biomedical data analysis, since a huge number of features is available for a limited number of samples. The aim of this study was to develop a methodology for training interpretable machine learning models in the classification of ALS and ALS-subtypes samples, using gene expression datasets.MethodsWe performed dimensionality reduction in gene expression data using a semi-automated preprocessing systematic gene selection procedure using Statistically Equivalent Signature (SES), a causality-based feature selection algorithm, followed by Boosted Regression Trees (XGBoost) and Random Forest to train the machine learning classifiers. The SHapley Additive exPlanations (SHAP values) were used for interpretation of the machine learning classifiers. The methodology was developed and tested using two distinct publicly available ALS RNA-seq datasets. We evaluated the performance of SES as a dimensionality reduction method against: (a) Least Absolute Shrinkage and Selection Operator (LASSO), and (b) Local Outlier Factor (LOF).ResultsThe proposed methodology achieved 85.18% accuracy for the classification of cerebellum or frontal cortex samples as C9orf72-related familial ALS, sporadic ALS or healthy samples. Importantly, the genes identified as the most determinative have also been reported as disease-associated in ALS literature. When tested in the evaluation dataset, the methodology achieved 88.89% accuracy for the classification of sporadic ALS motor neuron samples. When LASSO was used as feature selection method instead of SES, the accuracy of the machine learning classifiers ranged from 74.07 to 96.30%, depending on tissue assessed, while LOF underperformed significantly (77.78% accuracy for the classification of pooled cerebellum and frontal cortex samples).ConclusionsUsing SES, we addressed the challenge of high dimensionality in gene expression data analysis, and we trained accurate machine learning ALS classifiers, specific for the gene expression patterns of different disease subtypes and tissue samples, while identifying disease-associated genes.

Read full abstract

Spam is defined as junk and unwanted e-mail. The implementation of a reliable spam email filter becomes more and more important for e-mail users since they have to face with the growing amount of uninvited e-mails. The faults of spam classifiers are characterized by being more and more insufficient to handle huge volumes of relevant emails and to identify and detect the new spam email as example with high performance. The problem in spam classifiers is a huge number of features. Feature selection is an important task in keyword content classification for being among the most popular and effective methods for feature reduction. Accordingly, irrelevant and redundant features that can impede performance would be eliminated. Meta-heuristic optimization is to choose the optimal solution between possible multi-solutions, which respect the aim of this research that is the performance. The other problem is related to ambiguity of the effect of optimization feature selection on multiple classifiers algorithm which are popular used by previous work namely; K-nearest Neighbor, Naive Bayesian and Support Vector Machine. Therefore, the aim of this research is to improve the accuracy of feature selection by applying hybrid Water Cycle and Simulated Annealing to optimize results and to evaluate the proposed Spam Detection. The methodology used in this study which consists of groundwork, induction, improvement, evaluation and comparison quality. The cross-validation was used for training and validation dataset and seven datasets were employed in testing the spam classification proposed. The results demonstrate that the meta-heuristic namely water cycle feature selection (WCFS) was employed and three ways of hybridization with Simulated Annealing as a feature selection employed. In comparison with other feature selection algorithms such as Harmony Search, Genetic Algorithm, and Particle Swarm, the hybridization interleaved hybridization outperformed other feature selection algorithms with accuracy 96.3%, on the other side the effect of using three classifier algorithms, the SVM was better than other of classifier algorithms with f-measurement 96.3%. The number of features using interleaved water cycle and Simulated Annealing the number of features has decreased to more than 50%.

Read full abstract

Huge Number Of Features Research Articles

Articles published on Huge Number Of Features

Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning

Understanding How CNNs Recognize Facial Expressions: A Case Study with LIME and CEM.

A screening method for ultra-high dimensional features with overlapped partition structures.

A Study on Facial Expression Change Detection Using Machine Learning Methods with Feature Selection Technique

Sparse elastic net multi-label rank support vector machine with pinball loss and its applications

Feature Mapping and Deep Long Short Term Memory Network-Based Efficient Approach for Parkinson’s Disease Diagnosis

A New Population Initialization of Particle Swarm Optimization Method Based on PCA for Feature Selection

Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal

Online feature selection system for big data classification based on multi-objective automated negotiation

Binary BAT algorithm and RBFN based hybrid credit scoring model

Review on Trait Selection of Tumor in the Field of Oncology With the Aid of Data Mining

A Multi-Objective Evolutionary Approach for Preprocessing Imbalanced Microarray Datasets

A Brief Conceptual View on Classification Using Support Vector Machine

Unsupervised Feature Selection by Pareto Optimization

Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection

A framework for event classification in tweets based on hybrid semantic enrichment

An Hour Ahead Electricity Price Forecasting with Least Square Support Vector Machine and Bacterial Foraging Optimization Algorithm

A Comparative Study on using Principle Component Analysis with different Text Classifiers

A Comparative Study of Feature Selection Techniques for Bat Algorithm in Various Applications

Credit Scoring Model based on Weighted Voting and Cluster based Feature Selection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Huge Number Of Features Research Articles

Articles published on Huge Number Of Features

Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning

Understanding How CNNs Recognize Facial Expressions: A Case Study with LIME and CEM.

A screening method for ultra-high dimensional features with overlapped partition structures.

A Study on Facial Expression Change Detection Using Machine Learning Methods with Feature Selection Technique

Sparse elastic net multi-label rank support vector machine with pinball loss and its applications

Feature Mapping and Deep Long Short Term Memory Network-Based Efficient Approach for Parkinson’s Disease Diagnosis

A New Population Initialization of Particle Swarm Optimization Method Based on PCA for Feature Selection

Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal

Online feature selection system for big data classification based on multi-objective automated negotiation

Binary BAT algorithm and RBFN based hybrid credit scoring model

Review on Trait Selection of Tumor in the Field of Oncology With the Aid of Data Mining

A Multi-Objective Evolutionary Approach for Preprocessing Imbalanced Microarray Datasets

A Brief Conceptual View on Classification Using Support Vector Machine

Unsupervised Feature Selection by Pareto Optimization

Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection

A framework for event classification in tweets based on hybrid semantic enrichment

An Hour Ahead Electricity Price Forecasting with Least Square Support Vector Machine and Bacterial Foraging Optimization Algorithm

A Comparative Study on using Principle Component Analysis with different Text Classifiers

A Comparative Study of Feature Selection Techniques for Bat Algorithm in Various Applications

Credit Scoring Model based on Weighted Voting and Cluster based Feature Selection