Search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information

Dariusz Mrozek,Bożena Małysiak-Mrozek,Artur Siążnik

doi:10.1186/1471-2105-14-73

Abstract

BackgroundDue to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing.ResultsWe have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases.Conclusionssearch GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.

Highlights

Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration
If we are interested in a wider spectrum of relations, for example, our goal is to find the amino acid sequences associated with genes that are, in some way, related to nucleotide sequences from the population set of sequences belonging to the mouse, the number of calls to the ELink tool is three: ESearch → ELink → ELink → ELink → EFetch For such a case, the ESearch returns a list of unique identifiers (UID) for all of the population sets of sequences for the mouse retrieved from the PopSet database
Comparing the search GenBank system to other tools that orchestrate Web services, like Taverna and SADI, and commercial MapForce and SQL Server Integration Services, we can say that these tools have a universal purpose and provide the possibility to explore a broader range of resources and Web services

Summary

Introduction

Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. The cooperation of scientists from different domains initiated the formation of intermediate and interdisciplinary scientific fields; those which can combine knowledge and experience from apparently distant research areas. Medicine and life sciences began to generate a huge amount of data This unimaginably large amount of data had to be processed and stored in some way. Various research centers around the world began to create data repositories trying to solve the problem of collecting and processing large volumes of biological information. The NCBI hosts the databases, and provides a uniform system for data retrieval and additional programming tools that allow software developers to create specialized applications for biomedical data integration

Objectives

Results

Discussion

Conclusion