Abstract
BackgroundAnalysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses.ResultsWe present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster.ConclusionOur R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Highlights
Analysis of a microarray experiment often results in a list of hundreds of diseaseassociated genes
We evaluated the package by using a performance benchmark and by applying the methods to microarray data from a testicular germ cell tumor study [17]
We have developed tools to cluster genes from microarray experiments using semantic similarity measures
Summary
We have developed tools to cluster genes from microarray experiments using semantic similarity measures. Our efficient implementation of similarity measures enables analysis of gene sets with hundreds of genes that are typically seen in microarray experiments. We combined expression data and GO annotations using hierarchical clustering and a heat map visualisation that together enable rapid identification of genes sharing similar biological functions. Our results suggest that GObased annotation analysis approaches may be able to take advantage of the accumulated knowledge available in literature over approaches using pathway databases, which are typically updated in a much slower pace than the GO database. At most three GO terms are shown for each cluster. Common GO terms with IC > 0 are shown. MAPK signaling pathway, Focal adhesion, Gap junction, Regulation of actin cytoskeleton, Glioma, Prostate cancer, Melanoma. Genes not shown in the table did not have any KEGG pathway annotation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.