Abstract
BackgroundAlthough carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases.ResultsWe have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.ConclusionGlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource.
Highlights
Carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes
Since none of the existing solutions is truly capable of encompassing the whole problem space of carbohydrate sequences, we found it necessary to develop a new sequence format, called GlycoCT, which is a superset of the structural encoding capabilities inherent in all other formats developed so far [15]
Most of the digitally available carbohydrate sequences stem from retrospective literature analysis, and in most cases errors can only be detected by re-examining the original publications, which is beyond the scope of this project
Summary
Carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since several initiatives have developed independent databases with partially overlapping foci. BMC Bioinformatics 2008, 9:384 http://www.biomedcentral.com/1471-2105/9/384 ices [1]. This problem is especially evident for carbohydrate databases, where sequence information is spread in incompatible formats over several unconnected databases. The first publicly available carbohydrate structure database and the mother of all carbohydrate sequence databases is the Complex Carbohydrate Structure Database (CCSD), often called CarbBank in reference to the retrieval software used to access the data [2,3]. With about 50000 entries and more than 23000 different sequences, the CarbBank is still the largest repository of glycan data available
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.