Introduction Viral infections are commonly reported in immunocompromised patients such as hematopoietic stem cell transplant recipients. While there are several reports on the effect of individual viruses, such as CMV, there is increasing evidence that the composition of the blood-virome may reflect the immune status of patients. High resolution virome analysis may provide an accurate measure of a patient's immune status as well as provide pathogen surveillance. Next generation sequencing (NGS) of cell-free DNA (cfDNA) purified from plasma is being investigated for assessment of diseases such as graft vs host disease, making unbiased sequence data readily available for analyses of a patient's virome. Here we present a computational tool (Vira-ome) which probes the presence of a targeted panel of DNA viruses following NGS sequencing of cfDNA. Method We developed a pipeline which initially removes host DNA reads, then performs a reference-assisted assembly operation. The resulting assemblies were then passed to a K-mer profile based taxonomic classifier for annotation to the species level. We trained the K-mer based classifier for 3 different K-mer values using ∼14,000 DNA-virus genomes representing 36 prevalent and pathogenic species. Predictions of three different K-mer classifiers were used to make final predictions, applying a majority-wins rule. We tested our method on 30 simulated (1, 2, or 3 species mix with 50,100 or 500 NGS reads each) and 29 clinical samples obtained from a biorepository. Results The Vira-ome tool utilized cfDNA NGS data to screen for the presence of a targeted set of nine pathogenic viruses (human adenovirus, human herpesvirus 1, 2, 6, 7 and 8, BK polyomavirus, human parvovirus B19, JC polyomavirus, KI polyomavirus, WU polyomavirus, torque teno virus, Epstein Barr virus, human cytomegalovirus, and varicella-zoster virus). On the simulated set of viral sample mixes, our protocol had 100% accuracy with no false-positive predictions. For 29 clinical samples, our pipeline predicted: (A) Torque teno virus in 13/29 samples, (B) pathogenic viruses in 11/15 cases were predominated by BK polyomavirus (7/15 samples) and (C) at least one viral species in 10/14 cases. Selected samples were then analyzed by qPCR, which confirmed the presence of BKV, JCV, HHV7 and EBV in 8/9, 3/4, 1/1, 2/2 samples, respectively. Viral loads ranged from 6 – 106 copies/mL. Five samples predicted positive for the adenovirus, were not confirmed by qPCR, emphasizing the need of in vitro validation for in silico predictions. Conclusions The Vira-ome tool, designed to detect pathogenic viruses using cfDNA data, performed well both on simulated and clinical samples with a majority of results confirmed by qPCR. Our results emphasize how computational predictions can complement clinical diagnostic approaches.
Read full abstract