Iterative Feature Selection Research Articles

Abstract Background: In clinical practice it is a common challenge to correctly classify disease against normal cases and to identify disease subtypes. In complex diseases such as cancer where patterns are heterogeneous, highly complex interaction of pathways are involved, and continuous multi-level genomic changes occur, this is a challenging task. Using genomics data, analyzed by un-supervised and supervised machine learning tools, we demonstrate the ability to quantitatively and accurately describe a patient's tumor. Design: We used Level III RNASeq gene expression data from 20531 genes in 889 tumor samples of RCC across three subtypes - clear cell renal cell carcinoma (CCRCC), papillary renal cell carcinoma (PRCC), chromophobe carcinoma (ChRCC) and 129 normal samples from the Cancer Genome Atlas (TCGA). We developed a computational framework for feature (gene/transcript) selection and subtype predictive model construction. This framework relies on a well-known “random forest” (RF) method with iterative feature selection and 10-fold cross-validation. We performed a series of (1) tumor vs normal tissue experiments for each subtype; (2) pairwise subtype comparison, and finally (3) all three subtypes comparison and predictive genes identification. Results: In each computational run our method detected 2054 (10%) top varying genes, and estimated the predictive power of each of the selected genes using RF. On average this method demonstrated 93-97% accuracy. We identified genes known to play a role in renal cell carcinoma for example, CA9, LOX, SFRP1, SLC4A1, CDKN2A, KISS1R, EGF and others. In addition, our analysis uncovered genes that may represent characteristic patterns for subtyping and differentiation from normal renal tissue cells, for example TCF21, IRX1, STC2, UMOD, AQP2, ANGPTL4, BSND and FABP7, genes not previously associated with renal cell carcinoma. In three different experiments we differentiated each of the three subtypes from normal tissue, and performed enrichment analysis for the top most significant genes in each case. We observed that both CCRCC and PRCC have genes involved in the “glycosaminoglycan biosynthesis - heparan sulfate” (HS6ST2, HS6ST3) and “riboflavin metabolism” (ACPP) pathways. Whereas ChRCC is more strongly associated with the “glycosphingolipid biosynthesis - lacto and neolacto series” pathway (B3GNT3, FUT6) and have five genes (B3GNT3, CYP2B6, CYP2J2, FUT6, UGT2A3) involved in other metabolic pathways. Changes in glycosaminoglycan and glycolipid were also previously reported for associations with RCC. Conclusions: We demonstrate the effectiveness of a computational framework and predictive power of gene expression data for tumor subtyping in RCC. Our framework is generic and can be applied in combination with other types of data such as different modalities of genomic data (copy number variations, methylation) as well as clinical data. Citation Format: Konstantin Volyanskyy, Yong Mao, Yee Him Cheung, Balaji Santhanam, Vlado Menkovski, Zharko Aleksovski, Minghao Zhong, John T. Fallon, Nevenka Dimitrova. Normal versus tumor and subtype prediction in renal cell carcinoma TCGA data sets. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3651.

Abstract Background: Although the management of metastatic lung adenocarcinoma has been profoundly modified by the identification of actionnable molecular traits, decision making for early-stage lung adenocarcinoma still relies on the tumor stage (TNM) only. Most of these patients recur within 5 years after tumor resection and are given platinum based adjuvant chemotherapy. The recent availability of multiple gene expression data and the concomitant developments of novel algorithms - such as the Hypercube algorithm - could dramatically improve such predictive challenges. Methods: Datasets were selected from public repositories (Director's challenge (DC) n=274, Rousseaux n=86, Hou n=35, JBR.21 n=31) upon the following criteria: lung adenocarcinoma, stage IA-3A, no adjuvant therapy, R0 tumor resection. To generate robust predictors of 3-year overall survival, an iterative feature selection framework (Hypercube algorithm) was applied successively on DC and Hou. Only highly predictive rules in both datasets (AUC &gt;.7) were then included in a subsequent logistic regression model and finally assessed for external validation. The predictive performance of the models was evaluated by ROC-AUC and by Kaplan Meier curves. Results: Our best model combining gene expression (10 genes) and clinical data (TNM) was robust across three lung adenocarcinoma datasets (AUC= .85, .80 and .59). Particularly, it was able to better predict the 3 year of survival compared to the TNM-based clinical model only (100% vs. 77%). Kaplan Meier estimates of overall survival in the validation datasets were dramatically discriminated according to the high- and low-risk groups predicted by our model (all P values =.01). Interestingly, this predictive classifier was specific to lung adenocarcinoma and has no predictive value in lung squamous datasets. Conclusion: We developed a robust and highly performing gene expression predictive signature of 3-year overall survival, specific to the lung adenocarcinoma setting. These promising results have the potential to change the decision making at bedside for early stage lung adenocarcinoma patients. Citation Format: Yann Gaston-Mathé, CHARLES FERTE, benoit gauthier, mathilde bateson, david planchard, benjamin besse, Jean-Pierre Armand, jean-charles soria. Gene expression profiling robustly predicts the outcome of patients diagnosed with early stage lung adenocarcinoma. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 2357. doi:10.1158/1538-7445.AM2014-2357

Iterative Feature Selection Research Articles

Related Topics

Articles published on Iterative Feature Selection

PREDICTION OF TREATMENT OUTCOME FOR AUTISM FROM STRUCTURE OF THE BRAIN BASED ON SURE INDEPENDENCE SCREENING.

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Iterative Consistency-Based Feature Selection and Its Application to Nucleotide Sequences of Influenza A Viruses

280 A screening questionnaire for transient loss of consciousness

A Fast, Open EEG Classification Framework Based on Feature Compression and Channel Ranking

Prognostic value of combining a quantitative image feature from positron emission tomography with clinical factors in oligometastatic non-small cell lung cancer

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa).

Abstract 3651: Normal versus tumor and subtype prediction in renal cell carcinoma TCGA data sets

The dominance-based rough set approach to cylindrical plunge grinding process diagnosis

Real-time facial action unit intensity prediction with regularized metric learning

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data.

Fused stagewise regression — A waveband selection algorithm for spectroscopy

Accurate and Stable Feature Selection Powered by Iterative Backward Selection and Cumulative Ranking Score of Features

Abstract 2357: Gene expression profiling robustly predicts the outcome of patients diagnosed with early stage lung adenocarcinoma

A Novel Gene Expression Signature that Robustly Predicts the Outcome of Patients Diagnosed with Stage 1 Nsclc

THE USE OF UNDER- AND OVERSAMPLING WITHIN ENSEMBLE FEATURE SELECTION AND CLASSIFICATION FOR SOFTWARE QUALITY PREDICTION

Drug activity prediction using multiple-instance learning via joint instance and feature selection

Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans

DBSI: DNA-binding site identifier

Identification of Molecular Subtypes of Gastric Cancer With Different Responses to PI3-Kinase Inhibitors and 5-Fluorouracil

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Iterative Feature Selection Research Articles

Related Topics

Articles published on Iterative Feature Selection

PREDICTION OF TREATMENT OUTCOME FOR AUTISM FROM STRUCTURE OF THE BRAIN BASED ON SURE INDEPENDENCE SCREENING.

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Iterative Consistency-Based Feature Selection and Its Application to Nucleotide Sequences of Influenza A Viruses

280 A screening questionnaire for transient loss of consciousness

A Fast, Open EEG Classification Framework Based on Feature Compression and Channel Ranking

Prognostic value of combining a quantitative image feature from positron emission tomography with clinical factors in oligometastatic non-small cell lung cancer

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa).

Abstract 3651: Normal versus tumor and subtype prediction in renal cell carcinoma TCGA data sets

The dominance-based rough set approach to cylindrical plunge grinding process diagnosis

Real-time facial action unit intensity prediction with regularized metric learning

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data.

Fused stagewise regression — A waveband selection algorithm for spectroscopy

Accurate and Stable Feature Selection Powered by Iterative Backward Selection and Cumulative Ranking Score of Features

Abstract 2357: Gene expression profiling robustly predicts the outcome of patients diagnosed with early stage lung adenocarcinoma

A Novel Gene Expression Signature that Robustly Predicts the Outcome of Patients Diagnosed with Stage 1 Nsclc

THE USE OF UNDER- AND OVERSAMPLING WITHIN ENSEMBLE FEATURE SELECTION AND CLASSIFICATION FOR SOFTWARE QUALITY PREDICTION

Drug activity prediction using multiple-instance learning via joint instance and feature selection

Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans

DBSI: DNA-binding site identifier

Identification of Molecular Subtypes of Gastric Cancer With Different Responses to PI3-Kinase Inhibitors and 5-Fluorouracil