Abstract

BackgroundA large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.ResultsAUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.ConclusionAUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.

Highlights

  • A large number of gene prediction programs for the human genome exist

  • Ab initio gene prediction is an important tool for the task of finding new genes for which sufficient evidence from transcribed sequences is not available. It is important in genome projects of species where a large fraction of the genes cannot be constructed using expressed sequence tag (EST) evidence

  • For the test set of ENCODE genome annotation assessment project (EGASP), the predictions of five ab initio single genome programs were evaluated by the organizers of the workshop [6]

Read more

Summary

Introduction

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. With an increasing number of completely or partially sequenced genomes, computational prediction of proteincoding genes has become one of the most active fields of research in bioinformatics. This task is challenging for eukaryotes, where protein-coding exons are usually separated by non-coding introns of varying length. A recent extension of the program is able to integrate extrinsic information from arbitrary sources for improved prediction accuracy [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call