Incomplete selection makes it challenging to infer selection on genes at short time scales, especially for microorganisms, due to stronger linkage between loci. However, in many cases, the selective force changes with environment, time, or other factors, and it is of great interest to understand selective forces at this level to answer relevant biological questions. We developed a new method that uses the change in dN /dS , instead of the absolute value of dN /dS , to infer the dominating selective force based on sequence data across geographical scales. If a gene was under positive selection, dN /dS was expected to increase through time, whereas if a gene was under negative selection, dN /dS was expected to decrease through time. Assuming that the migration rate decreased and the divergence time between samples increased from between-continent, within-continent different-country, to within-country level, dN /dS of a gene dominated by positive selection was expected to increase with increasing geographical scales, and the opposite trend was expected in the case of negative selection. Motivated by the McDonald-Kreitman (MK) test, we developed a pairwise MK test to assess the statistical significance of detected trends in dN /dS . Application of the method to a global sample of dengue virus genomes identified multiple significant signatures of selection in both the structural and non-structural proteins. Because this method does not require allele frequency estimates and uses synonymous mutations for comparison, it is less prone to sampling error, providing a way to infer selection forces within species using publicly available genomic data from locations over broad geographical scales.
Read full abstract