Abstract

Large-scale parallel pyrosequencing produces unprecedented quantities of sequence data. However, when generated from viral populations current mapping software is inadequate for dealing with the high levels of variation present, resulting in the potential for biased data loss. In order to apply the 454 Life Sciences' pyrosequencing system to the study of viral populations, we have developed software for the processing of highly variable sequence data. Here we demonstrate our software by analyzing two temporally sampled HIV-1 intra-patient datasets from a clinical study of maraviroc. This drug binds the CCR5 coreceptor, thus preventing HIV-1 infection of the cell. The objective is to determine viral tropism (CCR5 versus CXCR4 usage) and track the evolution of minority CXCR4-using variants that may limit the response to a maraviroc-containing treatment regimen. Five time points (two prior to treatment) were available from each patient. We first quantify the effects of divergence on initial read k-mer mapping and demonstrate the importance of utilizing population-specific template sequences in relation to the analysis of next-generation sequence data. Then, in conjunction with coreceptor prediction algorithms that infer HIV tropism, our software was used to quantify the viral population structure pre- and post-treatment. In both cases, low frequency CXCR4-using variants (2.5–15%) were detected prior to treatment. Following phylogenetic inference, these variants were observed to exist as distinct lineages that were maintained through time. Our analysis, thus confirms the role of pre-existing CXCR4-using virus in the emergence of maraviroc-insensitive HIV. The software will have utility for the study of intra-host viral diversity and evolution of other fast evolving viruses, and is available from http://www.bioinf.manchester.ac.uk/segminator/.

Highlights

  • Sequencing platforms, such as the 454 Life Sciences’ GS-FLX pyrosequencing system, has greatly parallelized the determination of nucleotide order within genetic material, resulting in the ability to produce extremely large datasets [1]

  • We apply the software to the analysis of two HIV-1 infected individuals who did not respond optimally to the drug maraviroc

  • In each case when HXB2 was used as a template sequence fewer reads are mapped

Read more

Summary

Introduction

Sequencing platforms, such as the 454 Life Sciences’ GS-FLX pyrosequencing system, has greatly parallelized the determination of nucleotide order within genetic material, resulting in the ability to produce extremely large datasets [1]. The vast numbers of short sequence segments produced (termed reads) in conjunction with intrinsic error rates associated with the sequencing platform [2,3] pose challenging computational problems [4,5]. These data have the potential to provide previously unprecedented insight into the extent of pathogen variation (diversity) that exists within a single individual. For highly variable genomes this limitation will result in data loss as reads with more than the specified numbers of mismatches, in relation to a template sequence, are discarded This loss can occur non-randomly with reads representing minority subpopulations being less likely to be mapped to the template. When mapping V3 data to HXB2, and limiting the number of mismatches allowed, reads

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.