Abstract

BackgroundAs genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.ResultsAs part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.ConclusionsAnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at http://annotrack.sanger.ac.uk.

Highlights

  • As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists

  • The output of the GENCODE project within this framework [1], [2] is a set of genes assessed by a number of different methods and considered to be a reference set for other genome analysis

  • DAS servers and other data sources are accessed by the AnnoTrack system using automated Perl scripts in regular intervals; their data is compared to existing annotation and integrated into the database

Read more

Summary

Results

Data Integration All GENCODE partners are providing real-time access to their annotation and analysis results via DAS servers (Distributed Annotation System [12]). The application of DAS to supply genome-wide annotation on this scale is new, and poses challenges concerning the number of features, the different annotation formats and the interconnection between them. DAS servers and other data sources (flat files, databases) are accessed by the AnnoTrack system using automated Perl scripts in regular intervals; their data is compared to existing annotation and integrated into the database (figure 1). This includes comparing the genomic coordinates as well as the textual descriptions. Running the system does not require knowledge of the Redmine code or Ruby programming skills, but adjustments to parsers will require knowledge in Perl

Conclusions
Background
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.