Abstract

BackgroundIn cancer prognosis studies with gene expression measurements, an important goal is to construct gene signatures with predictive power. In this study, we describe the coordination among genes using the weighted coexpression network, where nodes represent genes and nodes are connected if the corresponding genes have similar expression patterns across samples. There are subsets of nodes, called modules, that are tightly connected to each other. In several published studies, it has been suggested that the first principal components of individual modules, also referred to as "eigengenes", may sufficiently represent the corresponding modules.ResultsIn this article, we refer to principal components and their functions as representative features". We investigate higher-order representative features, which include the principal components other than the first ones and second order terms (quadratics and interactions). Two gradient thresholding methods are adopted for regularized estimation and feature selection. Analysis of six prognosis studies on lymphoma and breast cancer shows that incorporating higher-order representative features improves prediction performance over using eigengenes only. Simulation study further shows that prediction performance can be less satisfactory if the representative feature set is not properly chosen.ConclusionsThis study introduces multiple ways of defining the representative features and effective thresholding regularized estimation approaches. It provides convincing evidence that the higher-order representative features may have important implications for the prediction of cancer prognosis.

Highlights

  • In cancer prognosis studies with gene expression measurements, an important goal is to construct gene signatures with predictive power

  • Gene signatures have been constructed for the prognosis of breast cancer, lymphoma, ovarian cancer, and cancers of many other organs [1]

  • For cancer prognosis studies with gene expression measurements, we describe the interplay among genes using the weighted coexpression network and use principal component analysis techniques to reduce the dimensionality of gene expressions

Read more

Summary

Introduction

In cancer prognosis studies with gene expression measurements, an important goal is to construct gene signatures with predictive power. We describe the coordination among genes using the weighted coexpression network, where nodes represent genes and nodes are connected if the corresponding genes have similar expression patterns across samples. High-throughput profiling has been extensively conducted, searching for genomic signatures with predictive power for traits or clinical outcomes. We analyze cancer prognosis studies, where the clinical outcomes are metastasis-free, overall, or other types of survival. We focus on microarray gene expression studies but note that the proposed approach is applicable to data generated using other profiling techniques. Gene signatures have been constructed for the prognosis of breast cancer, lymphoma, ovarian cancer, and cancers of many other organs [1].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call