XML schemas for common bioinformatic data types and their application in workflow systems.

Philipp N Seibel,Knut Schwarzer,Henning Mersch,Robert Giegerich,Jan Krüger,Sven Hartmeier,Kai Löwenthal,Thomas Dandekar

doi:10.1186/1471-2105-7-490

Philipp N Seibel, Knut Schwarzer + Show 6 more

Open Access

https://doi.org/10.1186/1471-2105-7-490

Copy DOI

Abstract

BackgroundToday, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats.ResultsAcknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at .ConclusionThe HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.

Highlights

Today, there is a growing need in bioinformatics to combine available software tools into chains, building complex applications from existing single-task tools
We describe some of these XML schemas and show examples of their application
The BioDOM library contains one Java class for each natively supported XML format, which implements methods to create the corresponding data structure by adding the necessary parts to the new document or importing data from ordinary data formats to XML elements. Each of these classes is based on the abstract class AbstractBioDOM, which provides commonly required methods for all converters, e.g. for setting and getting the documents object model (DOM) content, validating the document against an XML schema or creating a string representation of the XML data contained in the object

Summary

Results

Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net

Background

Results & Discussion

Conclusion

Stein L

17. Pearson WR

38. Morgenstern B

40. Zuker M

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Nov 6, 2006
Citations: 65	License type: cc-by

R Discovery Prime

R Discovery Prime

XML schemas for common bioinformatic data types and their application in workflow systems.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

CaTTS
François Bry ... Frank-André Ries
-
François Bry, et. al.François Bry ... Frank-André Ries
01 Jan 2004
01 Jan 2004

Managing Data Structure in R
Mittal Desai ... Chetan Dudhagara
-
Mittal Desai, et. al.Mittal Desai ... Chetan Dudhagara
14 Jun 2023
14 Jun 2023

The ProteoRed MIAPE web toolkit: A User-friendly Framework to Connect and Share Proteomics Standards
J Alberto Medina-Aunon ... Alberto Paradela
Molecular & cellular proteomics : MCP | VOL. 10
J Alberto Medina-Aunon, et. al.J Alberto Medina-Aunon ... Alberto Paradela
19 Jun 2011
Molecular & cellular proteomics : MCP | VOL. 10

Flexible approach for representing object oriented databases in XML format
Taher Nasser ... Reda Alhajj
-
Taher Nasser, et. al.Taher Nasser ... Reda Alhajj
24 Nov 2008
24 Nov 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

XML schemas for common bioinformatic data types and their application in workflow systems.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics