Abstract
Using deep sequencing technologies such as Illumina’s platform, it is possible to obtain reads from the viral RNA population revealing the viral genome diversity within a single host. A range of software tools and pipelines can transform raw deep sequencing reads into Sequence Alignment Mapping (SAM) files. We propose that interpretation tools should process these SAM files, directly translating individual reads to amino acids in order to extract statistics of interest such as the proportion of different amino acid residues at specific sites. This preserves per-read linkage between nucleotide variants at different positions within a codon location. The samReporter is a subsystem of the GLUE software toolkit which follows this direct read translation approach in its processing of SAM files. We test samReporter on a deep sequencing dataset obtained from a cohort of 241 UK HCV patients for whom prior treatment with direct-acting antivirals has failed; deep sequencing and resistance testing have been suggested to be of clinical use in this context. We compared the polymorphism interpretation results of the samReporter against an approach that does not preserve per-read linkage. We found that the samReporter was able to properly interpret the sequence data at resistance-associated locations in nine patients where the alternative approach was equivocal. In three cases, the samReporter confirmed that resistance or an atypical substitution was present at NS5A position 30. In three further cases, it confirmed that the sofosbuvir-resistant NS5B substitution S282T was absent. This suggests the direct read translation approach implemented is of value for interpreting viral deep sequencing data.
Highlights
For some virus species, their highly error-prone replication mechanism produces a population of related genomic variants of the virus within a single infected host individual [1]
We present a subsystem of the GLUE software package [22] called samReporter, focused on the analysis of aligned deep sequencing viral genome data
We demonstrate the benefits of applying the GLUE samReporter to hepatitis C virus (HCV)
Summary
Their highly error-prone replication mechanism produces a population of related genomic variants of the virus within a single infected host individual [1]. Sequencing systems such as Illumina’s platform produce short, relatively accurate nucleotide sections of viral genome, Viruses 2019, 11, 323; doi:10.3390/v11040323 www.mdpi.com/journal/viruses. Viruses 2019, 11, 323 often generating thousands of reads for a given genomic location from a single sample [2]. Such deep sequencing technologies offer methods for understanding the nature of viral intra-host diversity. Reads unrelated to the virus genome are removed and low-quality reads removed or trimmed
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have