Abstract

The third-generation sequencing technology, PacBio, has shown an ability to sequence the HIV virus amplicons in their full length. The long read of PaBio offers a distinct advantage to comprehensively understand the virus evolution complexity at quasispecies level (i.e. maintaining linkage information of variants) comparing to the short reads from Illumina shotgun sequencing. However, due to the highnoise nature of the PacBio reads, it is still a challenge to build accurate contigs at high sensitivity. Most of previously developed NGS assembly tools work with the assumption that the input reads are fairly accurate, which is largely true for the data derived from Sanger or Illumina technologies. When applying these tools on PacBio high-noise reads, they are largely driven by noise rather than true signal eventually leading to poor results in most cases. In this study, we propose the de novo assembly procedure, which comprises a positivefocused strategy, and linkage-frequency noise reduction so that it is more suitable for PacBio high-noise reads. We further tested the unique de novo assembly procedure on HIV PacBio benchmark data and clinical samples, which accurately assembled dominant and minor populations of HIV quasispecies as expected. The improved de novo assembly procedure shows potential ability to promote PacBio technology in the field of HIV drug-resistance clinical detection, as well as in broad HIV phylogenetic studies.

Highlights

  • HIV strains are frequently mutated from one HIV generation to the resulted in high genetic diversity of the HIV populations in a given infected host over a time period [1, 2]

  • The unique linkage-frequency based noise reduction plays a key role in the de novo assembly: A clear phenomenon is that the noises in PacBio reads occur randomly [8]

  • The effectiveness of linkage-frequency based noise reduction comes is due to the biological nature of how variants happen, which distinguish between random noise vs. biological true variants during the experimental procedure, rather than correcting PacBio noise based on certain pre-defined mathematical distribution model

Read more

Summary

Introduction

HIV strains are frequently mutated from one HIV generation to the resulted in high genetic diversity of the HIV populations (named “quasispecies”) in a given infected host over a time period [1, 2]. Under certain selective pressure (e.g. antiretroviral treatment), certain HIV quasispecies with special characteristics (e.g. drug resistant, high transmission) could be propagated [1, 3]. Sequencing and analysis of the HIV quasispecies is important for improving personalized treatment plan, developing early prevention action, designing more effective vaccine for patients [1, 3, 4]. We describe an improved de novo assembly procedure to accurately construct HIV quasispecies with high sensitivity. The procedure was successfully applied on HIV benchmark datasets, and on real-life HIV relapse patient samples, leading to the early detection of the dynamic of HIV drug resistance strains

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.