Improved Feature Selection Research Articles

BackgroundIn the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different ‘omics’ resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency.ResultsIn this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification.ConclusionsThe experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.

Although there are many attempts to build an optimal model for feature selection in Big Data applications, the complex nature of processing such kind of data makes it still a big challenge. Accordingly, the data mining process may be obstructed due to the high dimensionality and complexity of huge data sets. For the most informative features and classification accuracy optimization, the feature selection process constitutes a mandatory pre-processing phase to reduce dataset dimensionality. The exhaustive search for the relevant features is time-consuming. In this paper, a new binary variant of the wrapper feature selection grey wolf optimization and particle swarm optimization is proposed. The K-nearest neighbor classifier with Euclidean separation matrices is used to find the optimal solutions. A tent chaotic map helps in avoiding the algorithm from locked to the local optima problem. The sigmoid function employed for converting the search space from a continuous vector to a binary one to be suitable to the problem of feature selection. Cross-validation K-fold is used to overcome the overfitting issue. A variety of comparisons have been made with well-known and common algorithms, such as the particle swarm optimization algorithm, and the grey wolf optimization algorithm. Twenty datasets are used for the experiments, and statistical analyses are conducted to approve the performance and the effectiveness and of the proposed model with measures like selected features ratio, classification accuracy, and computation time. The cumulative features picked through the twenty datasets were 196 out of 773, as opposed to 393 and 336 in the GWO and the PSO, respectively. The overall accuracy is 90% relative to other algorithms ' 81.6 and 86.8. The total processing time for all datasets equals 184.3 seconds, wherein GWO and PSO equal 272 and 245.6, respectively.

Improved Feature Selection Research Articles

Related Topics

Articles published on Improved Feature Selection

Explainable Machine Learning for Default Privacy Setting Prediction

Improved Feature Selection Method for the Identification of Soil Images Using Oscillating Spider Monkey Optimization

A gray wolf algorithm for feature and parameter selection of support vector classification

An improve feature selection algorithm for defect detection of glass bottles

Application of Fuzzy Entropy to Improve Feature Selection for Defect Recognition Using Support Vector Machine in High Voltage Cable Joints

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Wool knitted fabric pilling objective evaluation based on double-branch convolutional neural network

A consensus multi-view multi-objective gene selection approach for improved sample classification

A new approach to health condition identification of rolling bearing using hierarchical dispersion entropy and improved Laplacian score

Learning robust affinity graph representation for multi-view clustering

Modelling an Effectual Glowworm Swarm Optimization Strategy for Feature Selection in Heart Disease Prediction

Enhancing BCI-Based Emotion Recognition Using an Improved Particle Swarm Optimization for Feature Selection.

Feature selection via normative fuzzy information weight with application into tumor classification

Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data

E-Commerce data classification in the cloud environment based on bayesian algorithm

Improved Feature Selection Model for Big Data Analytics

Prediction of Adverse Drug Reactions Using Improved Feature Selection and Modified Fuzzy based Variability Ratio Tuning for SVM Classifier

Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests

A Feature Selection Method based on the Pearson’s Correlation and Transformed Divergence Analysis

An Improved Feature Selection Method for Short Text Classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Improved Feature Selection Research Articles

Related Topics

Articles published on Improved Feature Selection

Explainable Machine Learning for Default Privacy Setting Prediction

Improved Feature Selection Method for the Identification of Soil Images Using Oscillating Spider Monkey Optimization

A gray wolf algorithm for feature and parameter selection of support vector classification

An improve feature selection algorithm for defect detection of glass bottles

Application of Fuzzy Entropy to Improve Feature Selection for Defect Recognition Using Support Vector Machine in High Voltage Cable Joints

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Wool knitted fabric pilling objective evaluation based on double-branch convolutional neural network

A consensus multi-view multi-objective gene selection approach for improved sample classification

A new approach to health condition identification of rolling bearing using hierarchical dispersion entropy and improved Laplacian score

Learning robust affinity graph representation for multi-view clustering

Modelling an Effectual Glowworm Swarm Optimization Strategy for Feature Selection in Heart Disease Prediction

Enhancing BCI-Based Emotion Recognition Using an Improved Particle Swarm Optimization for Feature Selection.

Feature selection via normative fuzzy information weight with application into tumor classification

Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data

E-Commerce data classification in the cloud environment based on bayesian algorithm

Improved Feature Selection Model for Big Data Analytics

Prediction of Adverse Drug Reactions Using Improved Feature Selection and Modified Fuzzy based Variability Ratio Tuning for SVM Classifier

Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests

A Feature Selection Method based on the Pearson’s Correlation and Transformed Divergence Analysis

An Improved Feature Selection Method for Short Text Classification