Performance Analysis of Cross-Assembly of Metatranscriptomic Datasets in Viral Community Studies

Yu.S Bukin,T.V Butina,A.N Bondaryuk

doi:10.17537/2023.18.418

Yu.S Bukin, T.V Butina + Show 1 more

Open Access

PDF Available

https://doi.org/10.17537/2023.18.418

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a Hidden Markov Model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using Hidden Markov Model present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.

Full Text