Characterization of 954 bovine full-CDS cDNA sequences

Gregory P Harhay,Curt P Van Tassell,Warren M Snelling,Ralph T Wiedmann,Michael P Heaton,Michael L Clawson,John W Keele,Timothy Pl Smith,Tad S Sonstegard

doi:10.1186/1471-2164-6-166

Gregory P Harhay, Curt P Van Tassell + Show 7 more

Open Access

https://doi.org/10.1186/1471-2164-6-166

Copy DOI

Abstract

BackgroundGenome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing.ResultsThe predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts.ConclusionIn most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.

Highlights

Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes
Strategy for bovine full-coding sequence (CDS) selection and sequencing The overall strategy for bovine FLIC sequences (bFLICs) processing is depicted in Figure 1 and is similar to an approach recently described for chicken bursal lymphocytes[15]
The bovine transcript sequences described here presently represent the largest publicly accessible resource of annotated full-CDS bFLICs. [Note added during review: since this manuscript's submission, 1710 bovine full-length insert cDNA sequences have been submitted to the Mammalian Gene Collection at NCBI by the Bovine Genome Sequencing Program, Genome Sequence Centre, BC Cancer Agency, Vancouver, BC, Canada] The comparative genomics approach employed for clone selection and the database driven sequencing and analysis pipeline provides a mechanism to target and produce full-CDS bFLICs for specific loci that are represented in available cDNA librar

Summary

Introduction

Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. An intermediate level of resolution and a critical check on the accuracy of the other methods can be provided by determining if the proper orientation, order, and spacing of exons in known expressed genes are maintained in the build This approach requires knowledge of expressed transcript sequence to compare to the genome build

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Nov 23, 2005
Citations: 52	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Characterization of 954 bovine full-CDS cDNA sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Functional genomics in chicken (Gallus gallus) - status and implications in poultry
S Dhanasekaran ... K Dyushanth
World's Poultry Science Journal | VOL. 70
S Dhanasekaran, et. al.S Dhanasekaran ... K Dyushanth
01 Mar 2014
World's Poultry Science Journal | VOL. 70

Functional genomics in chickpea: an emerging frontier for molecular-assisted breeding.
Tristan E Coram ... Edwin C K Pang
Functional plant biology : FPB | VOL. 34
Tristan E Coram, et. al.Tristan E Coram ... Edwin C K Pang
01 Jan 2007
Functional plant biology : FPB | VOL. 34

Differentially expressed genes in pancreatic ductal adenocarcinomas identified through serial analysis of gene expression
Steven R Hustinx ... Ralph H Hruban
Cancer Biology & Therapy | VOL. 3
Steven R Hustinx, et. al.Steven R Hustinx ... Ralph H Hruban
01 Dec 2004
Cancer Biology & Therapy | VOL. 3

A SAGE-based comparison between glomerular and aortic endothelial cells
Gürkan Sengoelge ... Jacob S Trevick
American Journal of Physiology-Renal Physiology | VOL. 288
Gürkan Sengoelge, et. al.Gürkan Sengoelge ... Jacob S Trevick
18 Jan 2005
American Journal of Physiology-Renal Physiology | VOL. 288

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Characterization of 954 bovine full-CDS cDNA sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics