Abstract

BackgroundTransfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing.ResultsWe designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples.ConclusionsHGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html.

Highlights

  • Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT)

  • In areas of Africa in which Burkitt’s lymphoma is endemic, Epstein-Barr virus (EBV) infection is found in nearly 100% of cases, and one hypothesis is that viral integration into the host genome contributes to the translocation involving the MYC oncogene that is responsible for this disease [18, 19]

  • HGT event detection in simulated data We compared the performance of HGT-ID, BATVI, and VirusFinder2 with the simulated data

Read more

Summary

Introduction

Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). Most known carcinogenic pathogens in humans are believed to work by establishing persistent inflammation [6], some cancer-associated viruses integrate into the genome [7,8,9]. These integrations could potentially disrupt the genome like that of transposable elements [3]. Various studies have reported integration of the human papillomavirus (HPV) in 80 to 100% of cervical cancers [11,12,13]; here, too, integration is associated with reduced survival [11], presumably because it disrupts coding regions important in the regulation of viral genes [14]. In areas of Africa in which Burkitt’s lymphoma is endemic, Epstein-Barr virus (EBV) infection is found in nearly 100% of cases, and one hypothesis is that viral integration into the host genome contributes to the translocation involving the MYC oncogene that is responsible for this disease [18, 19]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call