Abstract

Background. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Results. This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance. Uniqueness: the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. Conclusions. The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified.

Highlights

  • The use of gene signatures and Principal Component Analysis [1] (PCA) is a popular combination, but a recent publication has clearly shown drawbacks with this combination [2]

  • A coherent gene signature is an indication that a common mechanism or biological pathway is measured

  • If a gene signature describes more than one distinct biological effect, more than one PC will be significant (Figure 1(b))

Read more

Summary

Introduction

The use of gene signatures and Principal Component Analysis [1] (PCA) is a popular combination, but a recent publication has clearly shown drawbacks with this combination [2]. In this article, focus on how to quantitate the validity of applying PCA-based gene signatures to new datasets. PCA is a technique that reduces a high-dimensional dataset to a low-dimensional dataset while retaining most of the variation in the data. These new variables are referred to as scores, t, and the importance (weighting) of the original variables given in the loadings, p. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Complex signatures, describing multiple independent biological components, are identified

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call