Large Microarray Datasets Research Articles

INTRODUCTION: Gene expression data analysis is a critical aspect of disease prediction and classification, playing a pivotal role in the field of bioinformatics and biomedical research. High-dimensional gene expression datasets hold a wealth of information, but their effective utilization is hindered by the presence of irrelevant dimensions and noise. The challenge lies in extracting meaningful features from these datasets to enhance the accuracy of disease prediction and classification while maintaining computational efficiency. Feature selection is a crucial step in addressing these challenges, as it aims to identify and retain only the most informative characteristics from large high-dimensional microarray datasets. In the context of microarray gene expression data, characterized by its substantial dimensionality, selecting relevant features is essential for efficient nearest neighbor search, a fundamental component of various analytical tasks in bioinformatics and data mining. Existing feature selection methods in high-dimensional data often face issues related to the trade-off between search accuracy and computational efficiency. This paper introduces a novel approach, the Nearest Neighbor Feature Selection with Symmetrical Uncertainty-based Redundancy Removal (NNFSRR) method, designed to enhance the classification of microarray gene expression data through feature selection. The NNFSRR method focuses on reducing the dimensionality of the dataset by identifying and removing redundant features, allowing subsequent searches to operate solely on relevant dimensions. OBJECTIVES: The primary goal is to evaluate the NNFSRR method's effectiveness in improving nearest neighbor search in microarray gene expression datasets by reducing dimensionality. This method utilizes Symmetrical Uncertainty-based correlation between dimensions for feature selection and aims to enhance accuracy and efficiency compared to existing methods. METHODS: The NNFSRR method uses Symmetrical Uncertainty to identify and remove redundant features from microarray gene expression datasets. Reduced datasets are used for nearest neighbor search, improving accuracy and efficiency. Experiments are conducted using real-world datasets, and comparisons with existing methods are made based on search time and accuracy. RESULTS: The NNFSRR method demonstrates improved nearest neighbor search performance, outperforming basic brute force methods and existing feature selection techniques. Selected feature sets exhibit strong class associations while minimizing feature correlations, enhancing classification precision. CONCLUSION: In conclusion, the NNFSRR method presents a promising approach to address the challenges posed by high-dimensional gene expression data. It effectively reduces dimensionality, improves search accuracy, and enhances the efficiency of nearest neighbor search. Our experimental results demonstrate that this method outperforms existing techniques in terms of search time and accuracy, making it a valuable tool for applications in bioinformatics, data mining, pattern recognition, and biological information retrieval. The NNFSRR method holds the potential to advance our understanding of complex biological processes and support more accurate disease prediction and classification.

The abundance and functional orientation of tumor-infiltrating effector cells has long been observed to predict for reduced incidence of clinical metastasis and cancer-specific death. Using bioinformatics to mine large breast tumor microarray datasets, we and others have identified prognostic immune gene signatures, or metagenes. Robust evidence indicates that these metagenes are: 1) positively correlated with distant metastasis-free survival (DMFS) of patients, 2) comprised of genes that regulate immune cell-specific biology, and 3) reflective of the relative abundance of discernible populations of tumor infiltrating leukocytes. In recent work we have leveraged the statistical associations between the immune metagenes and the DMFS of breast cancer patients to explore the underlying phenotypes that differ in their ability to potentiate long-term, immune-mediated tumor rejection. Using a tumor classification model that combines the prognostic attributes of three distinct immune metagenes, termed the B/P, T/NK and M/D metagenes, we have identified molecular subtypes of breast cancer that either permit or prohibit prognostication by the immune metagenes. On this basis, we have delineated the phenotypic attributes of breast cancer that distinguish two novel immunogenic tumor subtypes, which we have defined as: immune benefit-enabled (IBE) and immune benefit-disabled (IBD). Phenotypically, IBE tumors comprise of Basal-like tumors and highly-proliferative HER2-Enriched and Luminal-B subtypes, while IBD tumors comprise of Claudin-Low, Luminal-A, and low-proliferative HER2-Enriched and Luminal-B tumors. Prognostically, IBE tumors (n = 666) can be stratified by the immune metagene model into prognostic subgroups with high statistical significance (P<0.0001,log-rank test), while IBD tumors cannot (n = 1005, P = 0.3) consistent with the capacity for an innate anti-tumor immunity against IBE tumors, but not IBD tumors, that guards against distant metastasis. Furthermore, these observations were independent of adjuvant treatment, and may owe to differential activation of immunomodulatory pathways. Network analysis revealed that IBE/IBD differentially-expressed genes (q<0.01) underlie highly-significant pathway activation scores for TGF-beta signaling in IBD (p < 0.0001), and Interferon-gamma signaling in IBE (p < 0.0001). Furthermore, 15 of 19 genes comprising the previously described Immunologic Constant of Rejection (Marincola and colleagues) were significantly overexpressed in IBE tumors (P-value range: 0.05-3.5E-14). Thus, we conclude that breast tumors can be dichotomized into two subtypes fundamentally distinct with respect to their potential for metastasis-protective immune responsiveness. These findings indicate new contexts for studying anti-tumor immunity and oncogenic mechanisms of immunosuppression in breast cancer. Whether IBE and IBD subtypes represent clinically-relevant contexts for assessing patient prognosis or evaluating the efficacy of immunotherapeutic treatments warrants further investigation (See Figure Figure11). Figure 1 (A-D) The immune metagenes are prognostic of IBE but no IBD breast cancer. (A) Heatmaps of metagene expression levels (rows) across 1,954 tumors (columns). Key shows color scale of mean centered, log2-transformed gene signal intensities. For each metagene, ...

Large Microarray Datasets Research Articles

Related Topics

Articles published on Large Microarray Datasets

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

Novel Feature Selection Algorithms Based on Crowding Distance and Pearson Correlation Coefficient

Survivin' Acute Myeloid Leukaemia-A Personalised Target for inv(16) Patients.

Selecting Relevant Genes From Microarray Datasets Using a Random Forest Model

A transcriptome-based classifier to determine molecular subtypes in medulloblastoma.

Transcriptomic Profiling for the Autophagy Pathway in Colorectal Cancer.

Blood transcriptome profiling captures dysregulated pathways and response to treatment in neuroimmunological disease.

Generalized gene co-expression analysis via subspace clustering using low-rank representation

Epigenetic regulation of placental gene expression in transcriptional subtypes of preeclampsia

Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) Algorithm for Classification of Micro Array Data.

Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network

Dynamic Genetic Algorithm-Based Feature Selection and Incomplete Value Imputation for Microarray Classification

Partial correlation screening for estimating large precision matrices, with applications to classification

Genetic Networks in Mouse Retinal Ganglion Cells.

Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA

TIGER: an evolutionary search for Top Inter-GEne Relations

DECODE: an integrated differential co-expression and differential expression analysis of gene expression data.

Rough Based Symmetrical Clustering for Gene Expression Profile Analysis.

Immune gene signatures and tumor intrinsic markers delineate novel immunogenic subtypes of breast cancer

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Microarray Datasets Research Articles

Related Topics

Articles published on Large Microarray Datasets

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

Novel Feature Selection Algorithms Based on Crowding Distance and Pearson Correlation Coefficient

Survivin' Acute Myeloid Leukaemia-A Personalised Target for inv(16) Patients.

Selecting Relevant Genes From Microarray Datasets Using a Random Forest Model

A transcriptome-based classifier to determine molecular subtypes in medulloblastoma.

Transcriptomic Profiling for the Autophagy Pathway in Colorectal Cancer.

Blood transcriptome profiling captures dysregulated pathways and response to treatment in neuroimmunological disease.

Generalized gene co-expression analysis via subspace clustering using low-rank representation

Epigenetic regulation of placental gene expression in transcriptional subtypes of preeclampsia

Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) Algorithm for Classification of Micro Array Data.

Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network

Dynamic Genetic Algorithm-Based Feature Selection and Incomplete Value Imputation for Microarray Classification

Partial correlation screening for estimating large precision matrices, with applications to classification

Genetic Networks in Mouse Retinal Ganglion Cells.

Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA

TIGER: an evolutionary search for Top Inter-GEne Relations

DECODE: an integrated differential co-expression and differential expression analysis of gene expression data.

Rough Based Symmetrical Clustering for Gene Expression Profile Analysis.

Immune gene signatures and tumor intrinsic markers delineate novel immunogenic subtypes of breast cancer