Long-read RNA sequencing analysis of the lytic human cytomegalovirus transcriptome

Zsolt Balázs

doi:10.14232/phd.10113

Abstract

Introduction The human cytomegalovirus (HCMV) is a ubiquitous herpesvirus and has a complex transcriptome. Polycistronism and alternative splicing make forming accurate transcript models particularly challenging. Long-read sequencing is a powerful nover tool that is able to distinguish between isoforms and discern a complex transcriptome. In order to gain a better insight into the transcriptional repertoire of the virus, we have sequenced the lytic HCMV transcriptome on multiple third-generation sequencing platforms. Our main objectives were to determine exon-connectivity, and to annotate the lytic transcriptome of the virus. In order to utilize the power of long-read sequencing, we have developed a pipeline that is suited for the analysis of long-read RNA sequencing data and is able to compare results obtained from different sequencing platforms. We also aimed to characterize the performance of each sequencing platform and library preparation method based on their ability to sequence full-length genuine transcripts. Materials and Methods Two biologically independent samples were sequenced. The first sample was subjected to cDNA sequencing on the Pacific Biosciences (PacBio) RSII and Sequel platforms as well as cDNA and dRNA sequencing on the Oxford Nanopore Technologies (ONT) MinION platform. The second sample was used for cap-selected cDNA sequencing on the MinION platform. The data were analysed using a custom pipeline utilizing the biopython and the pysam modules, and the bedtools software. Custom scripts were written to generate read statistics, characterize transcripts and to compare results. Results Over 80,000 cDNA reads were obtained from the two PacBio platforms and over 1,000,000 cDNA reads from the MinION platform. The direct RNA sequencing yielded 36,195 reads. The direct RNA sequencing reads were used to validate the cDNA sequencing results. We have created a pipeline for the analysis of long-read RNA sequencing data which accepts mapped sequencing reads produced by any long-read sequencing platform, and outputs a transcriptome annotation based on the sequenced reads. 440 isoforms were detected in our dataset. 377 of them were novel isoforms. The novel transcripts include TSS-, TES- or alternatively spliced isoforms of known genes, antisense transcripts and a novel intergenic transcript in the short repeat region. Many of the transcript isoforms only differed from each other in a few nucleotides, however, interestingly, most isoforms differed from each other in the combination of ORFs that they contained. Discussion Our results have more than doubled the number of annotated HCMV transcripts. Cross-platform validation gives these novel features high confidence. Using long-read RNA sequencing data we were able to draw a more detailed map of the HCMV transcriptome, which is instrumental both for the analysis of the viral gene expression and for understanding the molecular mechanisms of infection. Long-read RNA sequencing has discovered countless new isoforms in all the organisms for which it has been used. The biological function of most of these isoforms is currently unknown. However, our results show that many of the isoforms have distinct coding potentials, meaning that they code for different peptides of express upstream ORFs which may play a regulatory role during translation. With the headway of long-read sequencing technologies, the importance of bioinformatics tools that can analyse such data is increasing. We developed a pipeline which can rapidly process long-read RNA sequencing data from different platforms and create a transcriptome annotation which can be utilized by user with no bioinformatics background.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Long-read RNA sequencing analysis of the lytic human cytomegalovirus transcriptome

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Dynamic transcriptome profiling dataset of vaccinia virus obtained from long-read sequencing techniques.
Dóra Tombácz ... Michael Snyder
GigaScience | VOL. 7
Dóra Tombácz, et. al.Dóra Tombácz ... Michael Snyder
23 Nov 2018
GigaScience | VOL. 7

L-RAPiT: A Cloud-Based Computing Pipeline for the Analysis of Long-Read RNA Sequencing Data.
Theodore M Nelson ... Thomas S Postler
International journal of molecular sciences | VOL. 23
Theodore M Nelson, et. al.Theodore M Nelson ... Thomas S Postler
13 Dec 2022
International journal of molecular sciences | VOL. 23

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy.
Willem De Koning ... Milad Miladi
GigaScience | VOL. 9
Willem De Koning, et. al.Willem De Koning ... Milad Miladi
17 Oct 2020
GigaScience | VOL. 9

Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter.
Niccolo Tesi ... Henne Holstege
Genome research | VOL. -
Niccolo Tesi, et. al.Niccolo Tesi ... Henne Holstege
15 Oct 2024
Genome research | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Long-read RNA sequencing analysis of the lytic human cytomegalovirus transcriptome

Abstract

Talk to us

Similar Papers