Abstract
MotivationGene set testing, or pathway analysis, has become a critical tool for the analysis of high-dimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results.ResultsTo address this challenge, we have developed a bioinformatics approach to compute tissue-specific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database. To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing.Availability and implementationAll data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets.
Highlights
Gene set testing, or pathway analysis, has become an indispensable tool for the analysis and interpretation of high dimensional genomic data, including measures of DNA sequence variation, DNA methylation, RNA expression and protein abundance (Hung et al, 2012; Khatri et al, 2012)
To address this challenge, we have developed a bioinformatics approach to compute tissue-specific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA)
We used this approach to create a public repository of tissuespecific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database
Summary
Pathway analysis, has become an indispensable tool for the analysis and interpretation of high dimensional genomic data, including measures of DNA sequence variation, DNA methylation, RNA expression and protein abundance (Hung et al, 2012; Khatri et al, 2012). By focusing on the collective effect of biologically meaningful groups of genomic variables, rather than just the marginal effect of individual variables, gene set testing methods can significantly improve statistical power, replication of results and biological interpretation (Allison et al, 2006; Goeman and Buhlmann, 2007). It is well known that the expression and function of many genes is strongly linked to tissue context (Bossi and Lehner, 2009; Dezso et al, 2008; Ju et al, 2013; Keshava Prasad et al, 2009; Winter et al, 2004) [e.g. coagulation factor II (thrombin) is enriched in the liver (Uhlen et al, 2015)], with tissue-specificity extending to a sizable number of higher-level pathways, processes and cellular functions [e.g. keratinocyte differentiation is specific to the epidermis (Pierson et al, 2015)].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.