Hypermutated proviruses, which arise in a single HIV replication cycle when host antiviral APOBEC3 proteins introduce extensive G-to-A mutations throughout the viral genome, persist in all people living with HIV receiving antiretroviral therapy (ART). But, the within-host evolutionary origins of hypermutated sequences are incompletely understood because phylogenetic inference algorithms, which assume that mutations gradually accumulate over generations, incorrectly reconstruct their ancestor-descendant relationships. Using > 1400 longitudinal single-genome-amplified HIV env-gp120 sequences isolated from six women over a median 18 years of follow-up - including plasma HIV RNA sequences collected over a median 9 years between seroconversion and ART initiation, and > 500 proviruses isolated over a median 9 years on ART - we evaluated three approaches for removing hypermutation from nucleotide alignments. Our goals were to 1) reconstruct accurate phylogenies that can be used for molecular dating and 2) phylogenetically infer the integration dates of hypermutated proviruses persisting during ART. Two of the tested approaches (stripping all positions containing putative APOBEC3 mutations from the alignment, or replacing individual putative APOBEC3 mutations in hypermutated sequences with the ambiguous base R) consistently normalized tree topologies, eliminated erroneous clustering of hypermutated proviruses, and brought env-intact and hypermutated proviruses into comparable ranges with respect to multiple tree-based metrics. Importantly, these corrected trees produced integration date estimates for env-intact proviruses that were highly concordant with those from benchmark trees that excluded hypermutated sequences, indicating that the corrected trees can be used for molecular dating. Use of these trees to infer the integration dates of hypermutated proviruses persisting during ART revealed that these spanned a wide age range, with the oldest ones dating to shortly after infection. This indicates that hypermutated proviruses, like other provirus types, begin to be seeded into the proviral pool immediately following infection, and can persist for decades. In two of the six participants, hypermutated proviruses differed from env-intact ones in terms of their age distributions, suggesting that different provirus types decay at heterogeneous rates in some hosts. These simple approaches to reconstruct hypermutated provirus' evolutionary histories, allow insights into their in vivo origins and longevity, towards a more comprehensive understanding of HIV persistence during ART.
Read full abstract