Abstract

The existence of fundamental differences between lung adenocarcinoma (AC) and squamous cell carcinoma (SCC) in their underlying mechanisms motivated us to postulate that specific genes might exist relevant to prognosis of each histology subtype. To test on this research hypothesis, we previously proposed a simple Cox-regression model based feature selection algorithm and identified successfully some subtype-specific prognostic genes when applying this method to real-world data. In this article, we continue our effort on identification of subtype-specific prognostic genes for AC and SCC, and propose a novel embedded feature selection method by extending Threshold Gradient Descent Regularization (TGDR) algorithm and minimizing on a corresponding negative partial likelihood function. Using real-world datasets and simulated ones, we show these two proposed methods have comparable performance whereas the new proposal is superior in terms of model parsimony. Our analysis provides some evidence on the existence of such subtype-specific prognostic genes, more investigation is warranted.

Highlights

  • Microarray technology allows simultaneous monitoring of thousands of genes and measuring of their expression values

  • All data are publicly available from the Gene Expression Omnibus (GEO) repository and The Cancer Genome Atlas

  • Gene expression may carry the richest information on prognosis” by [20], we only consider gene expression profiles for prognosis of AC and squamous cell carcinoma (SCC) in this paper

Read more

Summary

Introduction

Microarray technology allows simultaneous monitoring of thousands of genes and measuring of their expression values. When data from a microarray experiment being analyzed, a feature selection algorithm, which downsizes the number of genes to a small manageable size, is becoming essential to tackle with difficulties associated with the issue of high dimensionality, namely, the number of genes is much larger than the number of samples. RNAsequencing (RNA-seq) has emerged as a novel technology for expression profiles and replaced microarray as the first choice for some biological research, e.g., transcriptomics [1]. RNA-seq data faces the challenge of high dimensionality. A feature selection algorithm plays the same crucial role in RNA-seq data analysis as in microarray analysis.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call