Abstract
BackgroundAlthough bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available.DescriptionIn the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster.ConclusionA publicly-available database has been populated with 29,745 unique sequences for the Pacific oyster Crassostrea gigas. The database provides many tools to search cleaned and assembled ESTs. The user may input and submit several filters, such as protein or nucleotide hits, to select and download relevant elements. This database constitutes one of the most developed genomic resources accessible among Lophotrochozoans, an orphan clade of bilateral animals. These data will accelerate the development of both genomics and genetics in a commercially-important species with the highest annual, commercial production of any aquatic organism.
Highlights
Bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species
Several factors motivate further development of genomic resources for C. gigas: (I) Because this species has the highest annual production of any aquatic organism, C. gigas has been the subject of a great deal of research to elucidate the molecular basis underlying the physiological and genetic mechanisms of economicallyrelevant traits. (II) The Pacific oyster's phylogenic position in the Lophotrochozoa, an understudied clade of bilaterian animals, makes molecular data on C. gigas highly relevant for studies of genome evolution. (III) Oysters play an important role as sentinels in estuarine and coastal marine habitats where increasing human activities exacerbate the impacts of disease and stress in exploited populations. (IV) C. gigas can be an invasive species when introduced into new habitats [8]
The genomic strategies currently employed for the identification of novel and previously-characterized genes affecting phenotypes of interest in the Pacific oyster include the identification of quantitative trait loci (QTL), and high-throughput studies of gene expression [21]
Summary
We report the production and the sequencing of clones from 9 cDNA libraries derived from different C. gigas tissues, and from oysters sampled under different conditions, obtaining 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences. Putative annotation was assigned to 43% of the sequences showing similarity to known genes, mostly from other species, in one or more of the databases used for automatic annotation. All data on ESTs, clustering, and annotation can be accessed from the dedicated database, GigasDatabase, available at http://public-contig browser.sigenae.org:9090/Crassostrea_gigas/index.html. This table lists 12790 non-redundant sequences identifying known C. gigas sequences showing significant similarity (E-value < 10-6) with predicted proteins from mollusks and other organisms. This table includes the GenBank Accession numbers of the ESTs and corresponding best SwissProt hit descriptions. Number of contigs Putative SNP sites with > 50 sequences with 11–50 sequences with 6–10 sequences with 5 sequences with 4 sequences with 3 sequences with 2 sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.