Abstract

BackgroundThe availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and ab initio gene predictions. Previously, the Genie gene-finding algorithm was trained on a small set of Chlamydomonas genes and shown to improve the accuracy of gene prediction in this species compared to other available programs. To improve ab initio gene finding in Chlamydomonas, we assemble a new training set consisting of over 2,300 cDNAs by assembling over 167,000 Chlamydomonas EST entries in GenBank using the EST assembly tool PASA.ResultsThe prediction accuracy of our cDNA-trained gene-finder, GreenGenie2, attains 83% sensitivity and 83% specificity for exons on short-sequence predictions. We predict about 12,000 genes in the version v3 Chlamydomonas genome assembly, most of which (78%) are either identical to or significantly overlap the published catalog of Chlamydomonas genes [1]. 22% of the published catalog is absent from the GreenGenie2 predictions; there is also a fraction (23%) of GreenGenie2 predictions that are absent from the published gene catalog. Randomly chosen gene models were tested by RT-PCR and most support the GreenGenie2 predictions.ConclusionThese data suggest that training with EST assemblies is highly effective and that GreenGenie2 is a valuable, complementary tool for predicting genes in Chlamydomonas reinhardtii.

Highlights

  • The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions

  • Constructing and evaluating a training-set of gene predictions from Expressed sequence tags (ESTs) Program to Assemble Spliced Alignments (PASA) aligned 167,641 high-quality Chlamydomonas EST sequences onto the published genome assembly of Chlamydomonas, which is called v3, and assembled those alignments into 19,707 unique models

  • The Program to Assemble Spliced Alignments (PASA) [2] was used to assemble Chlamydomonas EST sequences that were pre-aligned to the v3 Chlamydomonas genome assembly

Read more

Summary

Introduction

The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions This can be accomplished using multiple complementary methods that include ESTs, homology searches and ab initio gene predictions. Expressed sequence tags (ESTs) provide experimental evidence for the transcription of specific regions of the genome and significant similarity with known proteins in other organisms provides evidence for the existence of a gene. Both approaches have limitations that often preclude them from identifying the complete gene set. Ab initio gene-finders provide a complementary gene identification method by predicting gene models based on the statistical characteristics of a representative set of protein-coding genes from the genome of interest

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.