ABayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity.
RNA viruses replicate with high mutation rates, creating closely related viral populations. The heterogeneous virus populations, referred to as viral quasispecies, rapidly adapt to environmental changes thus adversely affecting efficiency of antiviral drugs and vaccines. Therefore, studying the underlying genetic heterogeneity of viral populations plays a significant role in the development of effective therapeutic treatments. Recent high-throughput sequencing technologies have provided invaluable opportunity for uncovering the structure of quasispecies populations. However, accurate reconstruction of viral quasispecies remains difficult due to limited read lengths and presence of sequencing errors. The problem is particularly challenging when the strains in a population are highly similar, that is, the sequences are characterized by low mutual genetic distances, and further exacerbated if some of those strains are relatively rare; this is the setting where state-of-the-art methods struggle. In this article, we present a novel viral quasispecies reconstruction algorithm, aBayesQR, that uses a maximum-likelihood framework to infer individual sequences in a mixture from high-throughput sequencing data. The search for the most likely quasispecies is conducted on long contigs that our method constructs from the set of short reads via agglomerative hierarchical clustering; operating on contigs rather than short reads enables identification of close strains in a population and provides computational tractability of the Bayesian method. Results on both simulated and real HIV-1 data demonstrate that the proposed algorithm generally outperforms state-of-the-art methods; aBayesQR particularly stands out when reconstructing a set of closely related viral strains (e.g., quasispecies characterized by low diversity).
37
- 10.1007/978-3-540-79450-9_15
- Jan 12, 2017
54
- 10.1093/bioinformatics/btu295
- Jun 11, 2014
- Bioinformatics
210
- 10.1093/bib/3.1.23
- Jan 1, 2002
- Briefings in Bioinformatics
161
- 10.1007/3-540-44676-1_15
- Jan 1, 2001
167
- Aug 1, 1994
- Infectious agents and disease
89
- 10.1109/tcbb.2013.145
- Jan 1, 2014
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
143
- 10.1093/nar/gku537
- Jun 27, 2014
- Nucleic Acids Research
81
- 10.1186/1471-2105-12-5
- Jan 5, 2011
- BMC Bioinformatics
121
- 10.1371/journal.pone.0006079
- Jun 29, 2009
- PLoS ONE
267
- 10.1186/1471-2105-12-119
- Apr 26, 2011
- BMC Bioinformatics
- Research Article
- 10.1371/journal.pcbi.1013360.r008
- Aug 21, 2025
- PLOS Computational Biology
Motivation: Oncoviruses, pathogens known to cause or increase the risk of cancer, include both common viruses such as human papillomaviruses and rarer pathogens such as human T-lymphotropic viruses. Computational methods for detecting viral DNA from data acquired by modern DNA sequencing technologies have enabled studies of the association between oncoviruses and cancers. Those studies are rendered particularly challenging when multiple species of oncovirus are present in a tumor sample. In such scenarios, merely detecting the presence of a sequencing read of viral origin is insufficiently informative—instead, a more precise characterization of the viral content in the sample is required.Results: We address this need with NextVir, to our knowledge the first multi-class viral classification framework that adapts genomic foundation models to detecting and classifying sequencing reads of oncoviral origin. Specifically, NextVir explores several foundation models—DNABERT-S, Nucelotide Transformer, and HyenaDNA—and efficiently fine-tunes them to enable accurate identification of the sequencing reads’ origin. The results demonstrate superior performance of the proposed framework over existing deep learning methods and suggest downstream potential for foundational models in genomics.
- Research Article
7
- 10.1186/s12859-022-05100-3
- Dec 19, 2022
- BMC Bioinformatics
BackgroundThe genomes of SARS-CoV-2 are classified into variants, some of which are monitored as variants of concern (e.g. the Delta variant B.1.617.2 or Omicron variant B.1.1.529). Proportions of these variants circulating in a human population are typically estimated by large-scale sequencing of individual patient samples. Sequencing a mixture of SARS-CoV-2 RNA molecules from wastewater provides a cost-effective alternative, but requires methods for estimating variant proportions in a mixed sample.ResultsWe propose a new method based on a probabilistic model of sequencing reads, capturing sequence diversity present within individual variants, as well as sequencing errors. The algorithm is implemented in an open source Python program called VirPool. We evaluate the accuracy of VirPool on several simulated and real sequencing data sets from both Illumina and nanopore sequencing platforms, including wastewater samples from Austria and France monitoring the onset of the Alpha variant.ConclusionsVirPool is a versatile tool for wastewater and other mixed-sample analysis that can handle both short- and long-read sequencing data. Our approach does not require pre-selection of characteristic mutations for variant profiles, it is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read.
- Research Article
3
- 10.1093/gigascience/giae065
- Jan 2, 2024
- GigaScience
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.
- Research Article
10
- 10.1093/gbe/evz069
- Mar 27, 2019
- Genome Biology and Evolution
RNA virus high mutation rate is a double-edged sword. At the one side, most mutations jeopardize proteins functions; at the other side, mutations are needed to fuel adaptation. The relevant question then is the ratio between beneficial and deleterious mutations. To evaluate this ratio, we created a mutant library of the 6K2 gene of tobacco etch potyvirus that contains every possible single-nucleotide substitution. 6K2 protein anchors the virus replication complex to the network of endoplasmic reticulum membranes. The library was inoculated into the natural host Nicotiana tabacum, allowing competition among all these mutants and selection of those that are potentially viable. We identified 11 nonsynonymous mutations that remain in the viral population at measurable frequencies and evaluated their fitness. Some had fitness values higher than the wild-type and some were deleterious. The effect of these mutations in the structure, transmembrane properties, and function of 6K2 was evaluated in silico. In parallel, the effect of these mutations in infectivity, virus accumulation, symptoms development, and subcellular localization was evaluated in the natural host. The α-helix H1 in the N-terminal part of 6K2 turned out to be under purifying selection, while most observed mutations affect the link between transmembrane α-helices H2 and H3, fusing them into a longer helix and increasing its rigidity. In general, these changes are associated with higher within-host fitness and development of milder or no symptoms. This finding suggests that in nature selection upon 6K2 may result from a tradeoff between within-host accumulation and severity of symptoms.
- Research Article
- 10.1007/978-1-0716-4702-8_6
- Feb 24, 2012
- Methods in molecular biology (Clifton, N.J.)
RNA viruses, such as HIV, HCV, and SARS-CoV-2, show high levels of intrahost genetic diversity. Many different haplotypes can be present in a single infection, which can be studied using next-generation sequencing. However, full-length haplotype reconstruction from short reads is computationally challenging due to the presence of low-frequency mutants, as well as sequencing errors. Moreover, reads may not be long enough to span regions between neighboring mutations. Finally, the sequencing depths needed to discover such low-frequency mutants result in large datasets, which require highly efficient algorithms. In this review, we provide an overview of current strategies to address these challenges and identify potential directions for increasing the accuracy and efficiency of viral haplotype reconstruction. Such developments will be key to advancing our understanding of viral evolution, improving treatment strategies, and informing public health interventions.
- Preprint Article
3
- 10.1101/2020.09.29.318642
- Oct 1, 2020
Abstract Haplotype assembly and viral quasispecies reconstruction are challenging tasks concerned with analysis of genomic mixtures using sequencing data. High-throughput sequencing technologies generate enormous amounts of short fragments (reads) which essentially oversample components of a mixture; the representation redundancy enables reconstruction of the components (haplotypes, viral strains). The reconstruction problem, known to be NP-hard, boils down to grouping together reads originating from the same component in a mixture. Existing methods struggle to solve this problem with required level of accuracy and low runtimes; the problem is becoming increasingly more challenging as the number and length of the components increase. This paper proposes a read clustering method based on a convolutional auto-encoder designed to first project sequenced fragments to a low-dimensional space and then estimate the probability of the read origin using learned embedded features. The components are reconstructed by finding consensus sequences that agglomerate reads from the same origin. Mini-batch stochastic gradient descent and dimension reduction of reads allow the proposed method to efficiently deal with massive numbers of long reads. Experiments on simulated, semi-experimental and experimental data demonstrate the ability of the proposed method to accurately reconstruct haplotypes and viral quasispecies, often demonstrating superior performance compared to state-of-the-art methods.
- Research Article
2
- 10.3389/fcimb.2021.715143
- Nov 3, 2021
- Frontiers in Cellular and Infection Microbiology
BackgroundRecently, more patients who recovered from the novel coronavirus disease 2019 (COVID-19) may later test positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) again using reverse transcription-polymerase chain reaction (RT-PCR) testing. Even though it is still controversial about the possible explanation for clinical cases of long-term viral shedding, it remains unclear whether the persistent viral shedding means re-infection or recurrence.MethodsSpecimens were collected from three COVID-19-confirmed patients, and whole-genome sequencing was performed on these clinical specimens during their first hospital admission with a high viral load of SARS-CoV-2. Laboratory tests were examined and analyzed throughout the whole course of the disease. Phylogenetic analysis was carried out for SARS-CoV-2 haplotypes.ResultsWe found haplotypes of SARS-CoV-2 co-infection in two COVID-19 patients (YW01 and YW03) with a long period of hospitalization. However, only one haplotype was observed in the other patient with chronic lymphocytic leukemia (YW02), which was verified as one kind of viral haplotype. Patients YW01 and YW02 were admitted to the hospital after being infected with COVID-19 as members of a family cluster, but they had different haplotype characteristics in the early stage of infection; YW01 and YW03 were from different infection sources; however, similar haplotypes were found together.ConclusionThese findings show that haplotype diversity of SARS-CoV-2 may result in viral adaptation for persistent shedding in multiple recurrences of COVID-19 patients, who met the discharge requirement. However, the correlation between haplotype diversity of SARS-CoV-2 virus and immune status is not absolute. It showed important implications for the clinical management strategies for COVID-19 patients with long-term hospitalization or cases of recurrence.
- Research Article
8
- 10.1093/bioinformatics/btab076
- Feb 3, 2021
- Bioinformatics
Many tools can reconstruct viral sequences based on next-generation sequencing reads. Although existing tools effectively recover local regions, their accuracy suffers when reconstructing the whole viral genomes (strains). Moreover, they consume significant memory when the sequencing coverage is high or when the genome size is large. We present WgLink to meet this challenge. WgLink takes local reconstructions produced by other tools as input and patches the resulting segments together into coherent whole-genome strains. We accomplish this using an L0+L1-regularized regression, synthesizing variant allele frequency data with physical linkage between multiple variants spanning multiple regions simultaneously. WgLink achieves higher accuracy than existing tools both on simulated and on real datasets while using significantly less memory (RAM) and fewer CPU hours. Source code and binaries are freely available at https://github.com/theLongLab/wglink. Supplementary data are available at Bioinformatics online.
- Preprint Article
- 10.1101/2021.08.26.457874
- Aug 27, 2021
Abstract The availability of millions of SARS-CoV-2 sequences in public databases such as GISAID and EMBL-EBI (UK) allows a detailed study of the evolution, genomic diversity and dynamics of a virus like never before. Here we identify novel variants and sub-types of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intra-host viral populations. We asses our results using clustering entropy — the first time it has been used in this context.Our clustering approach reaches lower entropies compared to other methods, and we are able to boost this even further through gap filling and Monte Carlo based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the UK and GISAID datasets, but is also able to detect the much less represented (< 1% of the sequences) Beta (South Africa), Epsilon (California), Gamma and Zeta (Brazil) variants in the GISAID dataset. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large datasets.
- Research Article
- 10.1089/cmb.2025.0075
- May 20, 2025
- Journal of computational biology : a journal of computational molecular cell biology
It is estimated that approximately 15% of cancers worldwide can be linked to viral infections. The viruses that can cause or increase the risk of cancer include human papillomavirus, hepatitis B and C viruses, Epstein-Barr virus, and human immunodeficiency virus, to name a few. The computational analysis of the massive amounts of tumor DNA data, whose collection is enabled by the advancements in sequencing technologies, has allowed studies of the potential association between cancers and viral pathogens. However, the high diversity of oncoviral families makes reliable detection of viral DNA difficult, and the training of machine learning models that enable such analysis computationally challenging. We introduce XVir, a data pipeline that deploys a transformer-based deep learning architecture to reliably identify viral DNA present in human tumors. XVir is trained on a mix of sequencing reads coming from viral and human genomes, resulting in a model capable of robust detection of potentially mutated viral DNA across a range of experimental settings. Results on semi-experimental data demonstrate that XVir is able to achieve high classification accuracy, generally outperforming state-of-the-art competing methods. In particular, it retains high accuracy even when faced with diverse viral populations while being significantly faster to train than other large deep learning-based classifiers.
- Book Chapter
9
- 10.1007/978-3-319-56970-3_22
- Jan 1, 2017
RNA viruses replicate with high mutation rates, creating closely related viral populations. The heterogeneous virus populations, referred to as viral quasispecies, rapidly adapt to environmental changes thus adversely affecting efficiency of antiviral drugs and vaccines. Therefore, studying the underlying genetic heterogeneity of viral populations plays a significant role in the development of effective therapeutic treatments. Recent high-throughput sequencing technologies have provided invaluable opportunity for uncovering the structure of quasispecies populations. However, accurate reconstruction of viral quasispecies remains difficult due to limited read-lengths and presence of sequencing errors. The problem is particularly challenging when the strains in a population are highly similar, i.e., the sequences are characterized by low mutual genetic distances, and further exacerbated if some of those strains are relatively rare; this is the setting where state-of-the-art methods struggle. In this paper, we present a novel viral quasispecies reconstruction algorithm, aBayesQR, that employs a maximum-likelihood framework to infer individual sequences in a mixture from high-throughput sequencing data. The search for the most likely quasispecies is conducted on long contigs that our method constructs from the set of short reads via agglomerative hierarchical clustering; operating on contigs rather than short reads enables identification of close strains in a population and provides computational tractability of the Bayesian method. Results on both simulated and real HIV-1 data demonstrate that the proposed algorithm generally outperforms state-of-the-art methods; aBayesQR particularly stands out when reconstructing a set of closely related viral strains (e.g., quasispecies characterized by low diversity).
- Research Article
37
- 10.1038/embor.2009.61
- Apr 3, 2009
- EMBO reports
The meeting on ‘Quasispecies: past, present and future’ took place between 17 and 18 November 2008, in Barcelona, Spain, and was organized by J. Gomez, C. Lopez‐Galindez, M.A. Martinez & A. Mas. ![][1] A meeting was held in Barcelona, Spain, in November 2008 to celebrate the 30th anniversary of the publication of the article that described the extensive genetic heterogeneity of bacteriophage Qβ (Domingo et al , 1978), which is considered to mark the beginning of experimental studies on viral quasispecies. This meeting was held at the impressive fifteenth century building of the ancient Hospital de la Santa Creu in the old town of Barcelona, which is now the headquarters of the Institut d'Estudis Catalans, and was attended by C. Weissmann (Jupiter, FL, USA), M. Billeter (Zurich, Switzerland) and E. Domingo (Madrid, Spain), who were three early protagonists of the phage Qβ work at the University of Zurich in the 1970s. Several speakers presented their results on the theoretical aspects of the population dynamics of cells and viruses, the clinical implications of quasispecies, and extensions of the quasispecies concept to cellular genes and prions. The meeting was introduced by A. Mas (Albacete, Spain), who reflected on the increasing impact that quasispecies have had in the scientific literature over the past three decades, and quoted some of the key references on viral quasispecies (Martell et al , 1992; Meyerhans et al , 1989; Najera et al , 1995; Vignuzzi et al , 2006; for a historical review of the impact of quasispecies in virology, see Holland, 2006). The scientific presentations were opened by Weissmann and Domingo, who were the last and first authors of the 1978 paper, respectively. Their talks conveyed the scientific atmosphere of the 1970s—when molecular biology was carried out with few recombinant‐DNA techniques—to a young audience. Nucleic‐acid sequencing was in … [1]: /embed/graphic-1.gif
- Conference Article
1
- 10.1109/bibm.2014.6999128
- Nov 1, 2014
The viral quasispecies represent a set of related variants in a virus population (e.g. from an infected patient) that contain similar mutations due to the rapid and mutation-prone replications in viruses. The characterization of viral quasispecies in a highly divergent virus population is of great interest in biomedical research, in particular, to identify virulent and drug-resistant mutations in viral genomes for diagnosis of infectious diseases and targeted drug design. In recent years, next-generation sequencing (NGS) techniques have been widely used for deep sequencing of virus populations, in an attempt to characterize low abundant viral quasispecies containing specific mutations associated with virulence or drug-resistance. However, because of the short length of NGS reads, it remains a challenge to reconstruct viral quasispecies from NGS sequencing data. In this paper, we formulate the viral quasispecies reconstruction as the vertex coloring problem on a read conflict graph, and then apply heuristic algorithms to solve it. We compared our new algorithms with one existing software tool on three simulated datasets for HIV quasispecies reconstruction. The results showed our methods can improve the accuracy on the inference of the identities and quantities of viral quansispecies in a virus population.
- Research Article
90
- 10.1016/j.virusres.2016.09.016
- Sep 28, 2016
- Virus Research
Recent advances in inferring viral diversity from high-throughput sequencing data
- Conference Article
- 10.1109/bibm.2017.8218007
- Nov 1, 2017
As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.
- Research Article
81
- 10.1186/1471-2105-12-5
- Jan 5, 2011
- BMC Bioinformatics
BackgroundNext-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called in-silico recombinants.ResultsThe reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.ConclusionsThe combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.
- Front Matter
5
- 10.1111/mec.16884
- Mar 1, 2023
- Molecular Ecology
Long-read sequencing in ecology and evolution: Understanding how complex genetic and epigenetic variants shape biodiversity.
- Research Article
29
- 10.1093/bioinformatics/bty291
- Jun 27, 2018
- Bioinformatics
MotivationAs RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small.ResultsThis paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1–10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains.Availability and implementationTenSQR is available at https://github.com/SoYeonA/TenSQR.Supplementary informationSupplementary data are available at Bioinformatics online.
- Research Article
8
- 10.1093/bioinformatics/btt678
- Dec 3, 2013
- Bioinformatics
Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
- Research Article
204
- 10.1073/pnas.052712599
- Mar 5, 2002
- Proceedings of the National Academy of Sciences
Despite recent treatment advances, the majority of patients with chronic hepatitis C fail to respond to antiviral therapy. Although the genetic basis for this resistance is unknown, accumulated evidence suggests that changes in the heterogeneous viral population (quasispecies) may be an important determinant of viral persistence and response to therapy. Sequences within hepatitis C virus (HCV) envelope 1 and envelope 2 genes, inclusive of the hypervariable region 1, were analyzed in parallel with the level of viral replication in serial serum samples obtained from 23 patients who exhibited different patterns of response to therapy and from untreated controls. Our study provides evidence that although the viral diversity before treatment does not predict the response to treatment, the early emergence and dominance of a single viral variant distinguishes patients who will have a sustained therapeutic response from those who subsequently will experience a breakthrough or relapse. A dramatic reduction in genetic diversity leading to an increasingly homogeneous viral population was a consistent feature associated with viral clearance in sustained responders and was independent of HCV genotype. The persistence of variants present before treatment in patients who fail to respond or who experience a breakthrough during therapy strongly suggests the preexistence of viral strains with inherent resistance to IFN. Thus, the study of the evolution of the HCV quasispecies provides prognostic information as early as the first 2 weeks after starting therapy and opens perspectives for elucidating the mechanisms of treatment failure in chronic hepatitis C.
- Conference Article
- 10.1109/allerton.2017.8262878
- Oct 1, 2017
Viral quasispecies are heterogenous mixtures of viral strains generated as RNA viruses mutate and adapt to environmental changes. High-throughput DNA sequencing enables reconstruction of viral quasispecies and estimation of their abundances, thus providing information that assists in the development of effective antiviral drugs and vaccines. In this paper, sequencing data is represented by means of a binary tensor and the viral strains discovery is formulated as the tensor factorization problem. Performance of the proposed scheme is discussed. Results demonstrate effectiveness of the proposed algorithm.
- Research Article
1060
- 10.1038/nature04388
- Dec 4, 2005
- Nature
An RNA virus population does not consist of a single genotype; rather, it is an ensemble of related sequences, termed quasispecies. Quasispecies arise from rapid genomic evolution powered by the high mutation rate of RNA viral replication. Although a high mutation rate is dangerous for a virus because it results in nonviable individuals, it has been hypothesized that high mutation rates create a 'cloud' of potentially beneficial mutations at the population level, which afford the viral quasispecies a greater probability to evolve and adapt to new environments and challenges during infection. Mathematical models predict that viral quasispecies are not simply a collection of diverse mutants but a group of interactive variants, which together contribute to the characteristics of the population. According to this view, viral populations, rather than individual variants, are the target of evolutionary selection. Here we test this hypothesis by examining the consequences of limiting genomic diversity on viral populations. We find that poliovirus carrying a high-fidelity polymerase replicates at wild-type levels but generates less genomic diversity and is unable to adapt to adverse growth conditions. In infected animals, the reduced viral diversity leads to loss of neurotropism and an attenuated pathogenic phenotype. Notably, using chemical mutagenesis to expand quasispecies diversity of the high-fidelity virus before infection restores neurotropism and pathogenesis. Analysis of viruses isolated from brain provides direct evidence for complementation between members in the quasispecies, indicating that selection indeed occurs at the population level rather than on individual variants. Our study provides direct evidence for a fundamental prediction of the quasispecies theory and establishes a link between mutation rate, population dynamics and pathogenesis.
- Research Article
13
- 10.1093/bioinformatics/btaa782
- Sep 14, 2020
- Bioinformatics
RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. Supplementary data are available at Bioinformatics online.
- Research Article
89
- 10.1016/j.jmb.2010.02.005
- Feb 10, 2010
- Journal of Molecular Biology
Unfinished Stories on Viral Quasispecies and Darwinian Views of Evolution
- Book Chapter
8
- 10.1007/978-94-007-4899-6_2
- Jan 1, 2012
- Viruses: Essential Agents of Life
RNA viruses, such as human immunodeficiency virus, hepatitis C virus, influenza virus, and poliovirus replicate with very high mutation rates and exhibit very high genetic diversity. The extremely high genetic diversity of RNA virus populations originates that they replicate as complex mutant spectra known as viral quasispecies. The quasispecies dynamics of RNA viruses are closely related to viral pathogenesis and disease, and antiviral treatment strategies. Over the past several decades, the quasispecies concept has been expanded to provide an adequate framework to explain complex behavior of RNA virus populations. Recently, the quasispecies concept has been used to study other complex biological systems, such as tumor cells, bacteria, and prions. Here, we focus on some questions regarding viral and theoretical quasispecies concepts, as well as more practical aspects connected to pathogenesis and resistance to antiviral treatments. A better knowledge of virus diversification and evolution may be critical in preventing and treating the spread of pathogenic viruses.
- Research Article
- 10.1089/cmb.2024.15655.rfs2023
- Sep 1, 2024
- Journal of Computational Biology
- Research Article
- 10.1089/cmb.2023.0198
- May 1, 2024
- Journal of Computational Biology
- Research Article
- 10.1089/cmb.2023.0174
- May 1, 2024
- Journal of Computational Biology
- Research Article
2
- 10.1089/cmb.2023.0400
- Apr 1, 2024
- Journal of Computational Biology
- Research Article
3
- 10.1089/cmb.2023.0149
- Feb 1, 2024
- Journal of Computational Biology
- Research Article
2
- 10.1089/cmb.2023.0208
- Nov 1, 2023
- Journal of Computational Biology
- Research Article
- 10.1089/cmb.2023.29101.ht
- Nov 1, 2023
- Journal of Computational Biology
- Research Article
- 10.1089/cmb.2023.0194
- Nov 1, 2023
- Journal of Computational Biology
- Research Article
3
- 10.1089/cmb.2023.0242
- Oct 31, 2023
- Journal of Computational Biology
- Research Article
- 10.1089/cmb.2023.0185
- Oct 30, 2023
- Journal of Computational Biology
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.