High-dimensional Microarray Datasets Research Articles

INTRODUCTION: Gene expression data analysis is a critical aspect of disease prediction and classification, playing a pivotal role in the field of bioinformatics and biomedical research. High-dimensional gene expression datasets hold a wealth of information, but their effective utilization is hindered by the presence of irrelevant dimensions and noise. The challenge lies in extracting meaningful features from these datasets to enhance the accuracy of disease prediction and classification while maintaining computational efficiency. Feature selection is a crucial step in addressing these challenges, as it aims to identify and retain only the most informative characteristics from large high-dimensional microarray datasets. In the context of microarray gene expression data, characterized by its substantial dimensionality, selecting relevant features is essential for efficient nearest neighbor search, a fundamental component of various analytical tasks in bioinformatics and data mining. Existing feature selection methods in high-dimensional data often face issues related to the trade-off between search accuracy and computational efficiency. This paper introduces a novel approach, the Nearest Neighbor Feature Selection with Symmetrical Uncertainty-based Redundancy Removal (NNFSRR) method, designed to enhance the classification of microarray gene expression data through feature selection. The NNFSRR method focuses on reducing the dimensionality of the dataset by identifying and removing redundant features, allowing subsequent searches to operate solely on relevant dimensions. OBJECTIVES: The primary goal is to evaluate the NNFSRR method's effectiveness in improving nearest neighbor search in microarray gene expression datasets by reducing dimensionality. This method utilizes Symmetrical Uncertainty-based correlation between dimensions for feature selection and aims to enhance accuracy and efficiency compared to existing methods. METHODS: The NNFSRR method uses Symmetrical Uncertainty to identify and remove redundant features from microarray gene expression datasets. Reduced datasets are used for nearest neighbor search, improving accuracy and efficiency. Experiments are conducted using real-world datasets, and comparisons with existing methods are made based on search time and accuracy. RESULTS: The NNFSRR method demonstrates improved nearest neighbor search performance, outperforming basic brute force methods and existing feature selection techniques. Selected feature sets exhibit strong class associations while minimizing feature correlations, enhancing classification precision. CONCLUSION: In conclusion, the NNFSRR method presents a promising approach to address the challenges posed by high-dimensional gene expression data. It effectively reduces dimensionality, improves search accuracy, and enhances the efficiency of nearest neighbor search. Our experimental results demonstrate that this method outperforms existing techniques in terms of search time and accuracy, making it a valuable tool for applications in bioinformatics, data mining, pattern recognition, and biological information retrieval. The NNFSRR method holds the potential to advance our understanding of complex biological processes and support more accurate disease prediction and classification.

In today’s data-driven digital culture, there is a critical demand for optimized solutions that essentially reduce operating expenses while attempting to increase productivity. The amount of memory and processing time that can be used to process enormous volumes of data are subject to a number of limitations. This would undoubtedly be more of a problem if a dataset contained redundant and uninteresting information. For instance, many datasets contain a number of non-informative features that primarily deceive a given classification algorithm. In order to tackle this, researchers have been developing a variety of feature selection (FS) techniques that aim to eliminate unnecessary information from the raw datasets before putting them in front of a machine learning (ML) algorithm. Meta-heuristic optimization algorithms are often a solid choice to solve NP-hard problems like FS. In this study, we present a wrapper FS technique based on the sparrow search algorithm (SSA), a type of meta-heuristic. SSA is a swarm intelligence (SI) method that stands out because of its quick convergence and improved stability. SSA does have some drawbacks, like lower swarm diversity and weak exploration ability in late iterations, like the majority of SI algorithms. So, using ten chaotic maps, we try to ameliorate SSA in three ways: (i) the initial swarm generation; (ii) the substitution of two random variables in SSA; and (iii) clamping the sparrows crossing the search range. As a result, we get CSSA, a chaotic form of SSA. Extensive comparisons show CSSA to be superior in terms of swarm diversity and convergence speed in solving various representative functions from the Institute of Electrical and Electronics Engineers (IEEE) Congress on Evolutionary Computation (CEC) benchmark set. Furthermore, experimental analysis of CSSA on eighteen interdisciplinary, multi-scale ML datasets from the University of California Irvine (UCI) data repository, as well as three high-dimensional microarray datasets, demonstrates that CSSA outperforms twelve state-of-the-art algorithms in a classification task based on FS discipline. Finally, a 5%-significance-level statistical post-hoc analysis based on Wilcoxon’s signed-rank test, Friedman’s rank test, and Nemenyi’s test confirms CSSA’s significance in terms of overall fitness, classification accuracy, selected feature size, computational time, convergence trace, and stability.

High-dimensional Microarray Datasets Research Articles

Articles published on High-dimensional Microarray Datasets

Efficient and Intelligent Feature Selection via Maximum Conditional Mutual Information for Microarray Data

Optimal Feature Selection from High-dimensional Microarray Dataset Employing Hybrid IG-Jaya Model

Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification

Bayesian weighted random forest for classification of high-dimensional genomics data

Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data

Discrete equilibrium optimizer combined with simulated annealing for feature selection

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Attribute reduction with personalized information granularity of nearest mutual neighbors

Enriched Random Forest for High Dimensional Genomic Data.

Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets

AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets

A feature selection method for classification based on ensemble of penalized logistic models

Dynamic relevance and interdependent feature selection for continuous data

AIEOU: Automata-based improved equilibrium optimizer with U-shaped transfer function for feature selection

Improved salp swarm algorithm based on the levy flight for feature selection

Stability Investigation of Improved Whale Optimization Algorithm in the Process of Feature Selection

Cancer classification from microarray data for genomic disorder research using optimal discriminant independent component analysis and kernel extreme learning machine.

Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-dimensional Microarray Datasets Research Articles

Articles published on High-dimensional Microarray Datasets

Efficient and Intelligent Feature Selection via Maximum Conditional Mutual Information for Microarray Data

Optimal Feature Selection from High-dimensional Microarray Dataset Employing Hybrid IG-Jaya Model

Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification

Bayesian weighted random forest for classification of high-dimensional genomics data

Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data

Discrete equilibrium optimizer combined with simulated annealing for feature selection

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Attribute reduction with personalized information granularity of nearest mutual neighbors

Enriched Random Forest for High Dimensional Genomic Data.

Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets

AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets

A feature selection method for classification based on ensemble of penalized logistic models

Dynamic relevance and interdependent feature selection for continuous data

AIEOU: Automata-based improved equilibrium optimizer with U-shaped transfer function for feature selection

Improved salp swarm algorithm based on the levy flight for feature selection

Stability Investigation of Improved Whale Optimization Algorithm in the Process of Feature Selection

Cancer classification from microarray data for genomic disorder research using optimal discriminant independent component analysis and kernel extreme learning machine.

Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm