Abstract

BackgroundAnalysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses.ResultsWe present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster.ConclusionOur R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

Highlights

  • Analysis of a microarray experiment often results in a list of hundreds of diseaseassociated genes

  • We evaluated the package by using a performance benchmark and by applying the methods to microarray data from a testicular germ cell tumor study [17]

  • We have developed tools to cluster genes from microarray experiments using semantic similarity measures

Read more

Summary

Conclusion

We have developed tools to cluster genes from microarray experiments using semantic similarity measures. Our efficient implementation of similarity measures enables analysis of gene sets with hundreds of genes that are typically seen in microarray experiments. We combined expression data and GO annotations using hierarchical clustering and a heat map visualisation that together enable rapid identification of genes sharing similar biological functions. Our results suggest that GObased annotation analysis approaches may be able to take advantage of the accumulated knowledge available in literature over approaches using pathway databases, which are typically updated in a much slower pace than the GO database. At most three GO terms are shown for each cluster. Common GO terms with IC > 0 are shown. MAPK signaling pathway, Focal adhesion, Gap junction, Regulation of actin cytoskeleton, Glioma, Prostate cancer, Melanoma. Genes not shown in the table did not have any KEGG pathway annotation.

Background
Results and discussion
G2 G1 cut
Resnik P
16. Good P
19. Draghici S
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.