PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms

Marc Bruckskotten,Marcus Krüger,Thomas Braun,Anne Konzer,Mario Looso,Jürgen Hemberger,Franz Cemiĉ

doi:10.1186/1471-2105-11-336

Marc Bruckskotten, Marcus Krüger + Show 5 more

Open Access

https://doi.org/10.1186/1471-2105-11-336

Copy DOI

Abstract

BackgroundSeveral tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. Pie charts based on GO tree levels are, themselves, one-dimensional graphs, which cannot properly or efficiently represent the hierarchical specificity for the biological system being studied.ResultsHere we present a new method, which we name PCA2GO, capable of GO analysis using complex multidimensional experimental settings. We employed principal component analysis (PCA) and developed a new score, which takes into account the relative frequency of certain GO terms and their specificity (hierarchical position) within the GO graph. We evaluated the correlation between our representation score R and a standard measure of enrichment, namely p-values to convey the versatility of our approach to other methods and point out differences between our method and commonly used enrichment analyses. Although p values and the R score formally measure different quantities they should be correlated, because relative frequencies of GO terms occurrences within a dataset are an indirect measure of protein numbers related to this term. Therefore they are also related to enrichment. We showed that our score enables us to identify more specific GO-terms i.e. those positioned further down the GO-graph than other common tools used for this purpose. PCA2GO allows visualization and detection of multidimensional dependencies both within the acyclic graph (GO tree) and the experimental settings. Our method is intended for the analysis of several experimental sets, not for one set, like standard enrichment tools. To demonstrate the usefulness of our approach we performed a PCA2GO analysis of a fractionated cardiomyocyte protein dataset, which was identified by enhanced liquid chromatography-mass spectrometry (GeLC-MS). The analysis enabled us to detect distinct groups of proteins, which accurately reflect properties of biochemical cell fractions.ConclusionsWe conclude that PCA2GO is an alternative efficient GO analysis tool with unique features for detection and visualization of multidimensional dependencies within the dataset under study. PCA2GO reveals strongly correlated GO terms within the experimental setting (in this case different fractions) by PCA group formation and improves detection of more specific GO terms within experiment dependent GO term groups than standard p value calculations.

Highlights

Introduction to principal component analysis (PCA)The idea is to map the investigated complex system from a multidimensional space to a reduced space spanned by a few principal components (PCs) thereby revealing the principal and most important features, which underlie the data set
To detect strongly represented Gene Ontology (GO) branches than with commonly used methods, we developed a specific representation score R that combined with PCA takes into account relative frequencies of gene product occurrences within the data set and topological ordering of the GO-directed acyclic graph (DAG)
PCA2GO does not extend the list of over represented GO terms obtained by standard p-value based enrichment analyses, but detects the most specific terms, which are unique for a group

Summary

Introduction

Introduction to PCAThe idea is to map the investigated complex system from a multidimensional space to a reduced space spanned by a few principal components (PCs) thereby revealing the principal and most important features, which underlie the data set. Identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. We were able to show, that our R score combined with principal component analysis (PCA) is capable to detect strongly represented branches of the GO DAG (i.e. gene products included within these branches) comparable to the output of standard p values enrichment tools

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 21, 2010
Citations: 17	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

TS-GOEA: a web tool for tissue-specific gene set enrichment analysis based on gene ontology
Jiajie Peng ... Hansheng Xue
BMC Bioinformatics | VOL. 20
Jiajie Peng, et. al.Jiajie Peng ... Hansheng Xue
01 Nov 2019
BMC Bioinformatics | VOL. 20

Functional enrichment analysis based on long noncoding RNA associations
Kuo-Sheng Hung ... Wen-Shyong Tzou
BMC Systems Biology | VOL. 12
Kuo-Sheng Hung, et. al.Kuo-Sheng Hung ... Wen-Shyong Tzou
01 Apr 2018
BMC Systems Biology | VOL. 12

Five-hub genes identify potential mechanisms for the progression of asthma to lung cancer.
Weichang Yang ... Juan Wu
Medicine | VOL. 102
Weichang Yang, et. al.Weichang Yang ... Juan Wu
10 Feb 2023
Medicine | VOL. 102

Microarray data mining using Bioconductor packages
Haisheng Nie ... Francesco Ferrari
BMC Proceedings | VOL. 3
Haisheng Nie, et. al.Haisheng Nie ... Francesco Ferrari
16 Jul 2009
BMC Proceedings | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics