The complexity landscape of viral genomes.

Jorge Miguel Silva,Diogo Pratas,Tânia Caetano,Sérgio Matos

doi:10.1093/gigascience/giac079

Jorge Miguel Silva, Diogo Pratas + Show 2 more

Open Access

https://doi.org/10.1093/gigascience/giac079

Copy DOI

Journal: GigaScience	Publication Date: Aug 11, 2022
Citations: 6	License type: CC BY 4.0

Affiliation: University of Aveiro, University of Helsinki

Abstract

BackgroundViruses are among the shortest yet highly abundant species that harbor minimal instructions to infect cells, adapt, multiply, and exist. However, with the current substantial availability of viral genome sequences, the scientific repertory lacks a complexity landscape that automatically enlights viral genomes’ organization, relation, and fundamental characteristics.ResultsThis work provides a comprehensive landscape of the viral genome’s complexity (or quantity of information), identifying the most redundant and complex groups regarding their genome sequence while providing their distribution and characteristics at a large and local scale. Moreover, we identify and quantify inverted repeats abundance in viral genomes. For this purpose, we measure the sequence complexity of each available viral genome using data compression, demonstrating that adequate data compressors can efficiently quantify the complexity of viral genome sequences, including subsequences better represented by algorithmic sources (e.g., inverted repeats). Using a state-of-the-art genomic compressor on an extensive viral genomes database, we show that double-stranded DNA viruses are, on average, the most redundant viruses while single-stranded DNA viruses are the least. Contrarily, double-stranded RNA viruses show a lower redundancy relative to single-stranded RNA. Furthermore, we extend the ability of data compressors to quantify local complexity (or information content) in viral genomes using complexity profiles, unprecedently providing a direct complexity analysis of human herpesviruses. We also conceive a features-based classification methodology that can accurately distinguish viral genomes at different taxonomic levels without direct comparisons between sequences. This methodology combines data compression with simple measures such as GC-content percentage and sequence length, followed by machine learning classifiers.ConclusionsThis article presents methodologies and findings that are highly relevant for understanding the patterns of similarity and singularity between viral groups, opening new frontiers for studying viral genomes’ organization while depicting the complexity trends and classification components of these genomes at different taxonomic levels. The whole study is supported by an extensive website (https://asilab.github.io/canvas/) for comprehending the viral genome characterization using dynamic and interactive approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The complexity landscape of viral genomes.

Abstract

Talk to us

Similar Papers

More From: GigaScience

Lead the way for us

Similar Papers

National center for biotechnology information viral genomes project.
Yiming Bao ... Mikhail Rozanov
Journal of virology | VOL. 78
Yiming Bao, et. al.Yiming Bao ... Mikhail Rozanov
25 Jun 2004
Journal of virology | VOL. 78

Towards quantitative viromics for both double-stranded and single-stranded DNA viruses.
Simon Roux ... Matthew B Sullivan
PeerJ | VOL. 4
Simon Roux, et. al.Simon Roux ... Matthew B Sullivan
08 Dec 2016
PeerJ | VOL. 4

Nine new RNA viruses associated with the fire ant Solenopsis invicta from its native range.
Steven M Valles ... Adam R Rivers
Virus Genes | VOL. 55
Steven M Valles, et. al.Steven M Valles ... Adam R Rivers
07 Mar 2019
Virus Genes | VOL. 55

Integration of viral DNA sequences in cells transformed by adenovirus 2 or SV40.
...
Proceedings of the Royal Society of London. Series B. Biological Sciences | VOL. 210
, et. al. ...
19 Nov 1980
Proceedings of the Royal Society of London. Series B. Biological Sciences | VOL. 210

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The complexity landscape of viral genomes.

Abstract

Talk to us

Similar Papers

More From: GigaScience