Identifying similar transcripts in a related organism from de Bruijn graphs of RNA-Seq data, with applications to the study of salt and waterlogging tolerance in Melilotus

Shuhua Fu,Natasha L Teakle,Aaron M Tarone,Peter L Chang,Maren L Friesen,Sing-Hoi Sze

doi:10.1186/s12864-019-5702-5

Abstract

BackgroundA popular strategy to study alternative splicing in non-model organisms starts from sequencing the entire transcriptome, then assembling the reads by using de novo transcriptome assembly algorithms to obtain predicted transcripts. A similarity search algorithm is then applied to a related organism to infer possible function of these predicted transcripts. While some of these predictions may be inaccurate and transcripts with low coverage are often missed, we observe that it is possible to obtain a more complete set of transcripts to facilitate possible functional assignments by starting the search from the intermediate de Bruijn graph that contains all branching possibilities.ResultsWe develop an algorithm to extract similar transcripts in a related organism by starting the search from the de Bruijn graph that represents the transcriptome instead of from predicted transcripts. We show that our algorithm is able to recover more similar transcripts than existing algorithms, with large improvements in obtaining longer transcripts and a finer resolution of isoforms. We apply our algorithm to study salt and waterlogging tolerance in two Melilotus species by constructing new RNA-Seq libraries.ConclusionsWe have developed an algorithm to identify paths in the de Bruijn graph that correspond to similar transcripts in a related organism directly. Our strategy bypasses the transcript prediction step in RNA-Seq data and makes use of support from evolutionary information.

Highlights

A popular strategy to study alternative splicing in non-model organisms starts from sequencing the entire transcriptome, assembling the reads by using de novo transcriptome assembly algorithms to obtain predicted transcripts
Initial choice of contigs to extend For each transcript in a related organism, our goal is to recover the best path in the de Bruijn graph that corresponds to the transcript
We validate our algorithm on model organisms by applying BLAST to a database of annotated transcripts in each model organism itself and in two other related model organisms with varying evolutionary distances, including Schizosaccharomyces pombe against another yeast species Saccharomyces cerevisiae and another fungus Neurospora crassa, Drosophila melanogaster against another Drosophila species Drosophila pseudoobscura and mosquito Anopheles gambiae, Homo sapiens against squirrel monkey

Summary

Introduction

A popular strategy to study alternative splicing in non-model organisms starts from sequencing the entire transcriptome, assembling the reads by using de novo transcriptome assembly algorithms to obtain predicted transcripts. As the advance in high-throughput sequencing enables alternative splicing, which is crucial to a variety of biothe generation of large volumes of genomic information, logical functions The goal of these studies is to recover it provides researchers the opportunity to study non- as many isoforms as possible in order to understand the model organisms even in the absence of a fully sequenced underlying biological processes. These studies often start from sequencing the In the presence of a reference database, there are two entire transcriptome, while additional software is applied strategies for analyzing transcriptome data. A popular strategy of transcriptome assembly algorithms is to assemble the reads by obtaining a de Bruijn graph that represents the transcriptome [12,13,14,15]

Objectives

Methods

Results

Conclusion