Abstract

The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Y′) versus log10-transformed genome size (X′, genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y′ = ln(-46.200+22.678X′, whereas non-eukaryotes a linear model, Y′ = 0.045+0.977X′, both with high significance (p<0.001, R2>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3×106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245×106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

Highlights

  • An increasing amount of evidence supports a general positive correlation between gene content and genome size in prokaryotes and small eukaryotes, but whether this trend applies to all eukaryotes has been questioned and remains to be investigated [1,2,3]

  • Data collection Data up to date by February 2009 were retrieved from the Reference Sequence (RefSeq) collection in the National Center for Biotechnology information (NCBI; http://www.ncbi.nlm.nih.gov), the Integrated Microbial Genomes (IMG) system in DOE Joint Genome Institute (JGI; http://img.jgi.doe.gov), and peer-reviewed publications (Supplemental Table S1)

  • Dinoflagellate gene-coding percentages were estimated based on this formula:6100%, where average gene length was approximated as 1.346 kbp, a value previously found highly conserved in eukaryots [18]

Read more

Summary

Introduction

An increasing amount of evidence supports a general positive correlation between gene content and genome size in prokaryotes and small eukaryotes, but whether this trend applies to all eukaryotes has been questioned and remains to be investigated [1,2,3]. Dinoflagellates are one of the largest algal groups in the ocean, contributing significantly to oceanic primary production and coral reef building. The smallest documented dinoflagellate genomes are found in the coral reef symbiont Symbiodinium spp., ranging from 1.5 to 4.8 (average ,3) pg DNA per haploid genome [6], while the largest (250 pg DNA per haploid genome) is found in Prorocentrum micans [7]. How many genes are encoded in the genomes of these unicellular and seemingly simple organisms remains a question, which potentially bears significance on eukaryotic genome evolution. Information on gene contents of dinoflagellate genomes will allow researchers to gain understanding on how the large genomes favor or disfavor these organisms in their wide range of habitats

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.