Abstract

BackgroundResearch on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.Methodology/Principal FindingsWe developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results.Conclusion/SignificanceThe SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.

Highlights

  • Vast quantities of ‘‘-omics’’ data have been created and more is being generated at an increasingly rapid pace

  • Integrating lab data with public resources is difficult for biologists who may not possess significant computational skills to acquire and process heterogeneous data stored at different locations

  • We demonstrate the significance of semantic problem-solving environment (SPSE) in identifying gene knockout targets for T. cruzi

Read more

Summary

Introduction

Vast quantities of ‘‘-omics’’ data (proteomic, genomic, transcriptomic, metabolomic, etc.) have been created and more is being generated at an increasingly rapid pace These data reside in internal lab-specific repositories and in a growing number of external databases such as GeneDB [1], the EupathDB databases [2] TrypanoCyc [3], and TcSNP [4] for the parasite T. cruzi. To identify genes whose deletion by insertional knockout might result in avirulent and nonpathogenic (i.e. potential ‘‘vaccine’’ strains) of pathogenic organisms, investigators may need to integrate their internal lab specific gene expression or protein localization data with publicly available gene information sources, such as the gene ontology (GO) [5], pathway information sources such as KEGG [6], and orthologous genes sources such as TriTrypDB Such gene, pathway, and ortholog resources are publicly available and are often cross-referenced to each other, but are not integrated with ‘‘new’’ data being generated in various laboratories and which is not yet in these data repositories. We developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring indepth computer science knowledge

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call