Abstract

Nucleotide sequences contain hidden information about the forces for conservation and variation that shaped their evolutionary history. To glean sequences for hidden information motivates the study of similarities in sequence among orthologous and paralogous coding sequences, and also gives impetus for improved methods of phylogenetic estimation and hypothesis testing. Variation within populations is also evidential for evolutionary history. Within coding sequences, different patterns of variation are often observed between nonsynonymous nucleotide substitutions, which cause amino acid replacements, and synonymous nucleotide substitutions, which do not. For some coding sequences these differences are consistent with an evolutionary scenario featuring greater functional constraints on amino acid sequences than on nucleotide sequences. We have developed a sampling theory of selection and random genetic drift for interpreting the numbers of wildtype and variant nucleotides found among the polymorphic sites present in sequences of multiple alleles of a gene. This sampling theory has been used to interpret the patterns of intrapopulation polymorphism of 28 genes in Escherichia coli and Salmonella enterica, each gene exhibiting greater than 50 polymorphic sites among the alleles examined. Many of these genes have an excess of singleton amino acid polymorphisms, relative to the number of singleton synonymous polymorphisms. (A singleton polymorphism is one in which the sample is monomorphic except for a single variant.) In 22/28 genes, there is a greater proportion of singleton nonsynonymous poly- morphisms than the proportion of singleton synonymous polymorphisms, and in 8 genes this excess is statistically significant. This pattern is consistent with a model in which most amino acid polymorphisms are slightly deleterious and hence present in samples at lower than expected frequencies. Furthermore, the sampling distribution of polymorphic synonymous nucleotide sites implies selection for optimal codon usage and enables estimation of the magnitude of the selection coefficients.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.