Abstract

Xia et al. (2003) discuss 2 whether the lengths of exons in eukaryotes and of genes in prokaryotes vary and whether they do so in relation to base composition (G+C content). In the paper which generated this debate, Oliver and Marin (1996) suggested that, given the compositional AT bias of standard stop codons (TAA, TAG, and TGA), a differential density of these termination signals is expected in random DNA sequences of different base composition, and therefore the expected length of reading frames (sequence segments of sense codons flanked by inphase stop codons) is a function of GC content. In other words, in GC-poor random sequences, the stop-codon density is expected to be higher than in GC-rich ones, and therefore the higher the GC content, the longer the expected reading frames. Empirical support for the model was sought by analyzing a sample of prokaryotic genes and a sample of eukaryotic exon data (Oliver and Marin 1996). With the model, the expected distribution of open reading frame (ORF) lengths in any random sequence with a given base composition can be computed; by comparing true ORF lengths to such random expectations, evolutionary forces involved in ORF lengthening can then be identified. Such comparisons can also be used for accurately predicting the coding content in anonymous sequences (Carpena et al. 2002). Xia et al. reevaluate Oliver and Marin’s work examining a considerably wider sample of eukaryotic exons and of genes of 68 completely sequenced prokaryotic genomes. These authors question the suggested association between base composition and ORF length mediated by differential stop-codon probability. However, Xia et al. find that, with the exception of Mycoplasma genitalium and Treponema pallidum, a positive correlation exists between ORF length and ORF GC content. Furthermore, a between-species comparison showed that the average ORF length (‘‘genomic CDS length’’) and the average ORF GC content (‘‘genomic %GC’’) are positively correlated among the 53 eubacterial genomes with a standard genetic code. Xia et al. acknowledge that the prediction by Oliver and Marin is largely fulfilled in prokaryotic genomes. In this regard, we would draw attention to the ‘‘natural experiment’’ provided by the four Mycoplasmataceae species (i.e., Mycoplasma genitalium, M. pneumoniae, M. pulmonis, and Ureaplasma urealyticum). Such bacteria use a genetic code with only two stop codons (TAA and TAG), and interestingly the average ORF length in these species is longer than that of other bacteria with a similar GC content but using three stop codons (TAA, TAG, and TGA). Thus, as acknowledged by Xia et al., the lower probability of encountering a stop codon could be responsible for, or at least is associated with, the longer average ORF length in these species, and this J Mol Evol (2003) 56:371–372 DOI: 10.1007/s00239-002-2407-0

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.