Comparison of computational methods for identifying translation initiation sites in EST data

Afshin Nadershahi,Lynda Bm Ellis,Scott C Fahrenkrug

doi:10.1186/1471-2105-5-14

Afshin Nadershahi, Lynda Bm Ellis + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-5-14

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2004
Citations: 67	License type: cc-by

Affiliation: University of Minnesota

Abstract

BackgroundExpressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr.ResultsA dataset of 100 EST sequences, 50 with and 50 without, translation initiation sites, was created. Based on analysis of this dataset, ATGpr is found to be the most accurate for predicting the presence versus absence of translation initiation sites. With a maximum accuracy of 76%, ATGpr more accurately predicts the position or absence of translation initiation sites than NetStart (57%) or Diogenes (50%). ATGpr similarly excels when start sites are known to be present (90%), whereas NetStart achieves only 60% overall accuracy. As a baseline for comparison, choosing the first ATG correctly identifies the translation initiation site in 74% of the sequences. ESTScan and Diogenes, consistent with their intended use, are able to identify open reading frames, but are unable to determine the precise position of translation initiation sites.ConclusionsATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. A database of EST sequences suitable for validating programs for translation initiation site prediction is now available. These tools and materials may open an avenue for future improvements in start site prediction and EST analysis.

Highlights

Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA
Presence versus absence of start sites predicting whether or not EST sequences contain the translation initiation site (TIS) may be very useful for some EST projects
This study evaluates the ability of ESTScan, Diogenes, Netstart, and ATGpr to predict the presence or absence of TIS

Summary

Introduction

Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr. Expressed sequence tags Complete sequences of the mouse and human genomes are available; completion of additional animal genomes is imminent. Effective methods for identifying genes, and the proteins they encode, have become increasingly important. Most genes can be identified through the open reading frame (ORF) of the protein they encode, detection in eukaryotic genomic sequence is more difficult since these genes are fragmented into small exons (averaging 145 bp in human), extending across large regions (averaging 27 kb in human) [1]. Due to cost and (page number not for citation purposes)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of computational methods for identifying translation initiation sites in EST data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Alternative Splicing: New Insights from Global Analyses
Benjamin J Blencowe
Cell | VOL. 126
Benjamin J BlencoweBenjamin J Blencowe
01 Jul 2006
Cell | VOL. 126

ESTplus: An Integrative System for Comprehensive and Customized EST Analysis and Proteomic Data Matching
Eakasit Pacharawongsakda ... Nitsara Karoonuthaisiri
-
Eakasit Pacharawongsakda, et. al.Eakasit Pacharawongsakda ... Nitsara Karoonuthaisiri
01 May 2008
01 May 2008

A hitchhiker's guide to expressed sequence tag (EST) analysis
S H Nagaraj ... S Ranganathan
Briefings in Bioinformatics | VOL. 8
S H Nagaraj, et. al.S H Nagaraj ... S Ranganathan
26 May 2006
Briefings in Bioinformatics | VOL. 8

Marine Genomics: A clearing-house for genomic and transcriptomic data of marine organisms
David J Mckillen ... Gregory W Warr
BMC Genomics | VOL. 6
David J Mckillen, et. al.David J Mckillen ... Gregory W Warr
10 Mar 2005
BMC Genomics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of computational methods for identifying translation initiation sites in EST data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics