Abstract

Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.

Highlights

  • next generation sequencing (NGS) paired-end reads from human sample stringent alignment to human genome local re-alignment around indels unmapped reads of partially mapped pairs discard low-complexity reads alignment to virus genomes

  • Our aim for high sensitivity was motivated by the knowledge that for example human immunodeficiency virus (HIV) or human T-lymphotropic virus (HTLV) has a tropism for T-lymphocytes with CD4 receptors[36,37], leaving most other cell types uninfected, and that viruses such as Epstein-Barr virus (EBV) or HTLV integrate at seemingly random sites and are difficult to detect unless a major clonal expansion takes place within the cell population[14,15,16,37]

  • When we aligned a number of full-length HiSeq reads from herpes candidates to the expected human genome sequence window using an interactive DIALIGN-based tool[51], we found that these reads were from the human genome

Read more

Summary

Introduction

NGS paired-end reads from human sample stringent alignment to human genome (hg) local re-alignment around indels unmapped reads of partially mapped pairs discard low-complexity reads alignment to virus genomes. Regardless of whether viruses are integrated into the host genome or not, one causal mechanism for cancer development is binding of virus proteins to the tumour suppressor p53, thereby inhibiting apoptosis This mechanism is exploited for example by HBV17, HPV18, herpes simplex type 119, measles[20], or simian virus type 4021. A third causal mechanism was more recently suggested when HBV integration sites were found to recurrently cluster near genomic rearrangement sites and were associated with chromosomal instability (chromothripsis)[23,24,25] This discovery led to systematic and large-scale virus integration analyses of The Cancer Genome Atlas data[11,26], and of data in many ongoing cancer studies including the childhood acute lymphoblastic leukemia (ALL) deep sequencing pilot study initiated by the German Federal Office for Radiation Protection (http://goo.gl/q7SaUZ). The resulting specific and efficient analysis pipeline would allow us to routinely scan all of our future genome, transcriptome, and targeted generation sequencing data for integrations of known viruses before further wet lab tests or computationally more expensive analyses are performed on selected samples

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.