Abstract Accurately identifying mutations under beneficial selection in viral genomes is crucial for understanding their molecular evolution and pathogenicity. Traditional methods like the Ka/Ks test, which assesses non-synonymous (Ka) versus synonymous (Ks) substitution rates, assume that synonymous substitutions at synonymous sites are neutral and thus is equal to the mutation rate (µ). Yet, evidence suggests that synonymous sites in translated (TR) regions and untranslated regions (UTR) can be under strong beneficial selection (Ks>µ) and strongly conserved (Ks ≈ 0), leading to false predictions of adaptive mutations from codon-by-codon Ka/Ks analysis. Our previous work used a relative substitution rate test (c/µ, c: substitution rate in UTR/TR, and µ: mutation rate) to identify adaptive mutations in SARS-CoV-2 genome without the neutrality assumption of the synonymous sites. This study refines the c/µ test by optimizing µ value, leading to a smaller set of nucleotide and amino acid sites under beneficial selection in both UTR (11 sites with c/µ>3) and TR (69 nonsynonymous sites: c/µ>3 and Ka/Ks>2.5; 107 synonymous sites: Ks/µ>3). Encouragingly, the top 2 mutations in UTR and 70% of the top nonsynonymous mutations in TR had reported or predicted effects in the literature. Molecular modeling of top adaptive mutations for some critical proteins (S, NSP11 and NSP5) was carried out to elucidate the possible molecular mechanism of their adaptivity.
Read full abstract