More Accurate Transcript Assembly via Parameter Advising.

Dan Deblasio,Carl Kingsford,Kwanho Kim

doi:10.1089/cmb.2019.0286

Abstract

Computational tools used for genomic analyses are becoming more accurate but also increasingly sophisticated and complex. This introduces a new problem in that these pieces of software have a large number of tunable parameters that often have a large influence on the results that are reported. We quantify the impact of parameter choice on transcript assembly and take some first steps toward generating a truly automated genomic analysis pipeline by developing a method for automatically choosing input-specific parameter values for reference-based transcript assembly using the Scallop tool. By choosing parameter values for each input, the area under the receiver operator characteristic curve (AUC) when comparing assembled transcripts to a reference transcriptome is increased by an average of 28.9% over using only the default parameter choices on 1595 RNA-Seq samples in the Sequence Read Archive. This approach is general, and when applied to StringTie, it increases the AUC by an average of 13.1% on a set of 65 RNA-Seq experiments from ENCODE. Parameter advisors for both Scallop and StringTie are available on Github.

Highlights

A s the field of computational biology has matured, there has been a significant increase in the amount of data that need to be processed and a corresponding increase in the reliance of users without computational expertise on the highly complicated programs that perform the analyses
Our results show that sample-specific parameter vectors are important for developing any genomic pipeline that includes transcriptome assembly as a step
We begin to answer the question of how to produce transcriptome assemblies effectively for any input without sacrificing quality or expanding manpower. This is done using a combination of parameter tuning through exploration using coordinate ascent and the established method of parameter advising

Summary

INTRODUCTION

A s the field of computational biology has matured, there has been a significant increase in the amount of data that need to be processed and a corresponding increase in the reliance of users without computational expertise on the highly complicated programs that perform the analyses. Tuning the parameter choices to increase accuracy for one input does not imply that the results will be improved for all inputs This means that, for optimum performance, tuning must be repeated for each new piece of data. In the case of high-throughput genomic analysis, this manual procedure is infeasible For these applications, without some sort of automatic parameter choice system, the defaults must be used. To address the automated parameter choice problem for multiple sequence alignment (MSA), DeBlasio and Kececioglu (2017b) have defined a framework to automatically select the parameter values for an input This process, called ‘‘parameter advising,’’ has been shown to greatly increase the accuracy of MSA without sacrificing wall-clock running time in most cases, and it can readily be applied to new domains. We use the same measure for selecting parameter choices for a given input

Contributions

DEVELOPING A PARAMETER ADVISOR FOR TRANSCRIPT ASSEMBLY

Advisor estimator

Finding an advisor set using coordinate ascent

Assessing the generality of learned parameter vectors

Advising for StringTie

Justification for a reference-based advising metric

CONCLUSIONS

FUNDING INFORMATION

Findings

Methods

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Aug 1, 2020
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

More Accurate Transcript Assembly via Parameter Advising.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Similar Papers

The Need for Speed and Energy Efficiency in Genome Analysis
Sachin Rawat
GEN biotechnology | VOL. 2
Sachin RawatSachin Rawat
01 Jun 2023
GEN biotechnology | VOL. 2

SRAdb: query and use public next-generation sequencing data from within R
Yuelin Zhu ... Robert M Stephens
BMC bioinformatics | VOL. 14
Yuelin Zhu, et. al.Yuelin Zhu ... Robert M Stephens
17 Jan 2013
BMC bioinformatics | VOL. 14

A multi-sample approach increases the accuracy of transcript assembly
Li Song ... Guangyu Yang
Nature Communications | VOL. 10
Li Song, et. al.Li Song ... Guangyu Yang
01 Nov 2019
Nature Communications | VOL. 10

Evaluation of de novo assembly technique in the South African abalone Haliotis midae transcriptome: A comparison from Illumina and 454 systems
Barbara Picone ... Rouvay Roodt-Wilding
Genomics Data | VOL. 10
Barbara Picone, et. al.Barbara Picone ... Rouvay Roodt-Wilding
13 Nov 2016
Genomics Data | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

More Accurate Transcript Assembly via Parameter Advising.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology