Abstract

Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data.

Highlights

  • To see whether the major tissues and cell types are covered by our annotations, we compiled a list of 30 major tissue types and a list of 167 major cell types by manually integrating information from the TiGER database [3], ENCODE [4] and human expert knowledge (Supplementary Table 1)

  • Among the 167 cell types, 119 (71.3%) were covered by our human PED annotations, 113 (67.7%) were covered by our mouse PED annotations, and 137 (82.0%) were covered by either human or mouse. These results indicate that the annotations used by GSCA provide a good coverage of the major tissues and cell types

  • The interactive POI is primarily used to help one formalize the question, and the POI is defined before one looks at the GSCA results. For applications of this type, the adjusted p-value reported by GSCA can be used as a statistical significance measure as long as one does not repeatedly tune the POI based on the GSCA results to make the findings “more significant”

Read more

Summary

Annotation of biological contexts

The annotations of biological contexts used by GSCA is based on the sample annotations provided by BARCODE [1]. The biological context of the sample is defined and annotated as the sample’s cell or tissue type and its associated treatment or disease condition. A sample can be annotated with “stem cell”, “embryonic stem cell” and “undifferentiated” simultaneously, and each of these keywords defines a biological context. The keywords can be grouped into different categories (e.g., “male” and “female” are two keywords for “gender”; “stem cell” and “neuron” are two keywords for “cell type”), and there may be internal structures among keywords (e.g., “embryonic stem cell” belongs to “stem cell”) If such a keyword-based annotation is available, one can generalize the current GSCA to test the association of a POI with each keyword. GSCA currently does not provide such an annotation system which we plan to incorporate in the future when it matures and is systematically evaluated

Annotation comprehensiveness
10.1 Interpretation of statistical significance in interactive analysis
10.2 Correlations among samples
Findings
B Rank Active Total FoldChange
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.