Abstract

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

Highlights

  • Viral phylogenetic methods are proving effective in addressing central questions in HIV-1 epidemiology: from characterizing continued transmissions in vulnerable populations[1,2] to quantifying their sources of transmission,[3,4] and detecting HIV-1 outbreaks in near real time.[5]

  • Next-generation sequencing data are available through the European Nucleotide Archive and HIV-1 consensus sequences are available upon request to the PANGEA-HIV steering committee (Supplementary Data)

  • PANGEA-HIV adopted a sequencing protocol that combined automated RNA extraction with amplification-dependent next-generation sequencing under the Gall protocol.[10]

Read more

Summary

Introduction

Viral phylogenetic methods are proving effective in addressing central questions in HIV-1 epidemiology: from characterizing continued transmissions in vulnerable populations[1,2] to quantifying their sources of transmission,[3,4] and detecting HIV-1 outbreaks in near real time.[5]. We report the first 3,985 PANGEA-HIV consensus sequences that were generated in high throughput at the Wellcome Trust Sanger Institute on the Illumina MiSeq platform, after automated extraction of viral RNA and amplification with a universal HIV-1 primer set.[10] The sequences are from diverse settings in sub-Saharan Africa, including cohorts of the general population at various surveillance sites (Rakai Community Cohort Study,[11] Mochudi Prevention Project,[12,13] MRC/UVRI Uganda general population and fisherfolk cohorts14–16), a cohort of female sex-workers (MRC/ UVRI Uganda Good Health for Women17), historical sequences from the 1980s, and a cohort of HIV-1 drug-resistant individuals from northern KwaZulu-Natal in South Africa (Africa Health Research Institute Resistance Cohort[18])

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call