Abstract Hypermutated proviruses, which arise in a single HIV replication cycle when host antiviral APOBEC3 proteins introduce extensive G-to-A mutations throughout the viral genome, persist in all people living with HIV receiving antiretroviral therapy (ART). But, hypermutated sequences are routinely excluded from phylogenetic trees because their extensive mutations complicate phylogenetic inference, and as a result we know relatively little about their within-host evolutionary origins and dynamics. Using >1400 longitudinal single-genome-amplified HIV env-gp120 sequences isolated from six women over a median 18 years of follow-up − including plasma HIV RNA sequences collected over a median 9 years between seroconversion and ART initiation, and >500 proviruses isolated over a median 9 years on ART − we evaluated three approaches for masking hypermutation in nucleotide alignments. Our goals were to 1) reconstruct phylogenies that can be used for molecular dating and 2) phylogenetically infer the integration dates of hypermutated proviruses persisting during ART. Two of the approaches (stripping all positions containing putative APOBEC3 mutations from the alignment, or replacing individual putative APOBEC3 mutations in hypermutated sequences with the ambiguous base R) consistently normalized tree topologies, eliminated erroneous clustering of hypermutated proviruses, and brought env-intact and hypermutated proviruses into comparable ranges with respect to multiple tree-based metrics. Importantly, these corrected trees produced integration date estimates for env-intact proviruses that were highly concordant with those from benchmark trees that excluded hypermutated sequences, supporting the use of these corrected trees for molecular dating. Subsequent molecular dating of hypermutated proviruses revealed that these sequences spanned a wide age range, with the oldest ones dating to shortly after infection. This indicates that hypermutated proviruses, like other provirus types, begin to be seeded into the proviral pool immediately following infection, and can persist for decades. In two of the six participants, hypermutated proviruses differed from env-intact ones in terms of their age distributions, suggesting that different provirus types decay at heterogeneous rates in some hosts. These simple approaches to reconstruct hypermutated provirus’ evolutionary histories reveal insights into their in vivo origins and longevity, towards a more comprehensive understanding of HIV persistence during ART.
Read full abstract