Abstract

The combined application of linear amplification-mediated PCR (LAM-PCR) protocols with next-generation sequencing (NGS) has had a large impact on our understanding of retroviral pathogenesis. Previously, considerable effort has been expended to optimize NGS methods to explore the genome-wide distribution of proviral integration sites and the clonal architecture of clinically important retroviruses like human T-cell leukemia virus type-1 (HTLV-1). Once sequencing data are generated, the application of rigorous bioinformatics analysis is central to the biological interpretation of the data. To better exploit the potential information available through these methods, we developed an optimized bioinformatics pipeline to analyze NGS clonality datasets. We found that short-read aligners, specifically designed to manage NGS datasets, provide increased speed, significantly reducing processing time and decreasing the computational burden. This is achieved while also accounting for sequencing base quality. We demonstrate the utility of an additional trimming step in the workflow, which adjusts for the number of reads supporting each insertion site. In addition, we developed a recall procedure to reduce bias associated with proviral integration within low complexity regions of the genome, providing a more accurate estimation of clone abundance. Finally, we recommend the application of a “clean-and-recover” step to clonality datasets generated from large cohorts and longitudinal studies. In summary, we report an optimized bioinformatics workflow for NGS clonality analysis and describe a new set of steps to guide the computational process. We demonstrate that the application of this protocol to the analysis of HTLV-1 and bovine leukemia virus (BLV) clonality datasets improves the quality of data processing and provides a more accurate definition of the clonal landscape in infected individuals. The optimized workflow and analysis recommendations can be implemented in the majority of bioinformatics pipelines developed to analyze LAM-PCR-based NGS clonality datasets.

Highlights

  • A hallmark of all retroviral infections and an essential step in the life cycle is the proviral DNA integration into the host genome (Demeulemeester et al, 2015)

  • In a longitudinal study of adult T-cell leukemia (ATL) patients, we demonstrated that the optimized next-generation sequencing (NGS) mapping protocol applied to human T-cell leukemia virus type-1 (HTLV-1) outperformed other currently available methods, enabling the detection of patients refractory to first-line therapy and providing a better estimation of response to therapy (Artesi et al, 2017)

  • We demonstrate that application of these steps to HTLV-1 and bovine leukemia virus (BLV) clonality datasets provides a better estimation of proviral integration site distribution and a more accurate picture of the clonal landscape in infected individuals

Read more

Summary

Introduction

A hallmark of all retroviral infections and an essential step in the life cycle is the proviral DNA integration into the host genome (Demeulemeester et al, 2015). The application of linear amplification-mediated PCR (Schmidt et al, 2002, 2007; LAM-PCR) and next-generation sequencing (NGS) has facilitated the identification of hundreds of thousands of retroviral integration sites genome-wide while simultaneously measuring the abundance of the corresponding clones (Paruzynski et al, 2010; Gillet et al, 2011; Firouzi et al, 2014; Maldarelli et al, 2014; Wagner et al, 2014; Sunshine et al, 2016; Artesi et al, 2017; Rosewick et al, 2017). Systematic exploration of insertion sites is currently recommended by the FDA as an approach to assess the risk of integration-related effects in gene therapy clinical trials (FDA Center for Biologics Evaluation and Research, 2019)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call