Abstract

MethodGenome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping.ResultsI first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

Highlights

  • Genome-wide expression profiling, or transcriptomics, is a highly popular approach for obtaining a systematic view of the molecular differences and similarities among cells, tissues, tumorPLOS ONE | DOI:10.1371/journal.pone.0143196 November 17, 2015gene ontology (GO)-principal component analysis (PCA): Exploring Gene Expression Data Using Prior Knowledge biopsies or other biological specimen

  • The high-dimensional and heterogeneous nature of transcriptomic data often makes it difficult to interpret the output of generic unsupervised algorithms, and technical artifacts can lead to the identification of biologically irrelevant clusters or factors [12] that further complicate the analysis

  • I introduced an exploratory method that first performs PCA to identify all major axes of variation, and uses GO enrichment analysis as a way to test for enrichment of functionally related genes driving each principal components (PCs)

Read more

Summary

Introduction

Genome-wide expression profiling, or transcriptomics, is a highly popular approach for obtaining a systematic view of the molecular differences and similarities among cells, tissues, tumorPLOS ONE | DOI:10.1371/journal.pone.0143196 November 17, 2015GO-PCA: Exploring Gene Expression Data Using Prior Knowledge biopsies or other biological specimen. Popular approaches include principal component analysis (PCA) [5], hierarchical clustering [6], k-means clustering, consensus clustering [7], non-negative matrix factorization (reviewed in [8]), mixture models (e.g., [9]), and many others. These methods can be characterized as generic, in that they operate based on general principles (e.g., prinicipal components are uncorrelated and capture maximum amounts of variance), and do not take any specific biological aspects of the data into account

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.