STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow

Igor Saggese,Flavio Mignone,Pietro Liò,Elisa Bona,Giovanni Manzini,Francesco Favero,Marco Ladetto,Max Conway

doi:10.1186/s12859-018-2174-6

Abstract

BackgroundDe novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis workflow is the assembly of reads to reconstruct transcripts thus reducing the complexity of the analysis. Despite many available tools show a good sensitivity, there is a high percentage of false positives due to the high number of assemblies considered and it is likely that the high frequency of false positive is underestimated by currently used benchmarks. The reconstruction of not existing transcripts may false the biological interpretation of results as – for example – may overestimate the identification of “novel” transcripts. Moreover, benchmarks performed are usually based on RNA-seq data from annotated genomes and assembled transcripts are compared to annotations and genomes to identify putative good and wrong reconstructions, but these tests alone may lead to accept a particular type of false positive as true, as better described below.ResultsHere we present a novel methodology of de novo assembly, implemented in a software named STAble (Short-reads Transcriptome Assembler). The novel concept of this assembler is that the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of reducing the number of chimeras produced. Furthermore, we applied a new set of benchmarks based on simulated data to better define the performance of assembly method and carefully identifying true reconstructions.STAble was also used to build a prototype workflow to analyse metatranscriptomics data in connection to a steady state metabolic modelling algorithm. This algorithm was used to produce high quality metabolic interpretations of small gene expression sets obtained from already published RNA-seq data that we assembled with STAble.ConclusionsThe presented results, albeit preliminary, clearly suggest that with this approach is possible to identify informative reactions not directly revealed by raw transcriptomic data.

Highlights

De novo assembly of RNA sequencing (RNA-seq) data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies
Among many applications of Generation Sequencing (NGS), [1] there are two techniques that can be applied to the “omic” study of transcripts: RNA-seq [2] that profiles transcriptomes from a single organism or metatranscriptomics that profiles transcriptomes from a complex microbial community
The first field is more established and allows to assess the presence of RNA transcripts in a biological sample at a given moment and to perform quantification. The latter is a more recent and less explored approach related to metagenomics studies: while metagenomics aims at the identification of species, metatranscriptomics tries to characterize functional active bacteria and their metabolic interaction through the identification of the expressed transcripts

Summary

Introduction

De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. The first field is more established and allows to assess the presence of RNA transcripts in a biological sample at a given moment and to perform quantification The latter is a more recent and less explored approach related to metagenomics studies: while metagenomics aims at the identification of species, metatranscriptomics tries to characterize functional active bacteria and their metabolic interaction through the identification of the expressed transcripts. Most of the evidence so far accumulated is linked to the role of specific species, genera or families rather that to their metabolic output While this might be optimal in terms of impact on immune recognition, immune education and trigger of autoimmune processes, this approach may be insufficient to fully elucidate the impact of microbial communities on processes such as metabolic diseases, inflammatory response, and nutrient availability which are potentially more strictly related to the global metabolic output rather than to the phylogenesis of the species composing a specific microbiota

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 1, 2018
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data.
Yuting Ye ... Jingyi Jessica Li
BMC Genomics | VOL. Suppl 17 1
Yuting Ye, et. al.Yuting Ye ... Jingyi Jessica Li
01 Jan 2015
BMC Genomics | VOL. Suppl 17 1

Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools.
Hussain Ahmed Chowdhury ... Jugal Kumar Kalita
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 17
Hussain Ahmed Chowdhury, et. al.Hussain Ahmed Chowdhury ... Jugal Kumar Kalita
01 Oct 2018
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 17

Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software
Chandra Shekhar Pareek ... Qaisar Shahzad
Translational Research in Veterinary Science | VOL. 2
Chandra Shekhar Pareek, et. al.Chandra Shekhar Pareek ... Qaisar Shahzad
12 Sep 2019
Translational Research in Veterinary Science | VOL. 2

Applications of genome-scale metabolic network models in the biopharmaceutical industry
Hyun Uk Kim ... Sang Yup Lee
Pharmaceutical Bioprocessing | VOL. 1
Hyun Uk Kim, et. al.Hyun Uk Kim ... Sang Yup Lee
01 Oct 2013
Pharmaceutical Bioprocessing | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics