A New Method for Estimating Nonsynonymous Substitutions and Its Applications to Detecting Positive Selection

Hua Tang,Chung-I Wu

doi:10.1093/molbev/msj043

Abstract

The standard methods for computing the number of nonsynonymous substitutions (Ka) lump all amino acid changes into one single class, even though their rates of substitution vary by at least 10-fold (Tang et al., 2004). Classifying these changes by their physicochemical properties has not been suitably effective in isolating the fastest evolving classes of changes. We now propose to use the Universal index U of Tang et al. (2004) to classify the 75 elementary amino acid changes (codons differing by 1 bp) by their evolutionary exchangeability. Let Ki denote the Ka value of each class (i = 1, ..., 75 from the most to the least exchangeable). The cumulative Ki for the top 10 classes, denoted Kh (for high-exchangeability types), has two important properties: (1) Kh usually accounts for 25%-30% of total amino acid changes and (2) when the observed number of amino acid substitutions is large, Kh is predictably twice the value of Ka. This shall be referred to as the twofold approximation. The new method for estimating Kh is applied to the comparisons between human and macaque and between mouse and rat. The twofold approximation holds well in these data sets, and the signature of positive selection can be more easily discerned using the Kh statistic than using Ka. Many genes with Ka/Ks > 0.5 can now be shown to have Kh/Ks > 1 and to have evolved adaptively, at least for the high-exchangeability group of amino acid changes.

Full Text