Abstract
BackgroundInfections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed.ResultsWe developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction.ConclusionsIn this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin.
Highlights
Infections by RNA viruses such as Influenza, Human Immunodeficiency Virus (HIV) still pose a serious threat to human health despite extensive research on viral diseases
High genetic diversity within viral populations has been observed in patients with chronic infection with RNA viruses such as Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), etc [1,2,3,4,5]
Previous studies have revealed that patients with chronic virus infections, such as Acquired Immune Deficiency Syndrome (AIDS), are often the reservoir of new viral variants, which are likely produced during the replication process [7]
Summary
Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. High genetic diversity within viral populations has been observed in patients with chronic infection with RNA viruses such as Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), etc [1,2,3,4,5]. The genetic diversity could be caused by multiple infections of different strains or by mutations during the virus replication inside the host In the latter case, the high replication rate, coupled with the low fidelity of the viral polymerase in most RNA viruses, results in a group of different but related strains infecting the same host, which is often termed as “quasispecies” [6]. Because different strains could have very different biological properties such as virulence, transmissibility, antiviral drug resistance etc, characterizing the genetic diversity within viral populations is very important for developing effective prevention and treatment strategies.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have