De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Kristoffer Sahlin,Paul Medvedev

doi:10.1089/cmb.2019.0299

Abstract

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Journal: Journal of Computational Biology	Publication Date: Mar 16, 2020
Citations: 65

Similar Papers

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm
Kristoffer Sahlin ... Paul Medvedev
-
Kristoffer Sahlin, et. al.Kristoffer Sahlin ... Paul Medvedev
01 Jan 2019
01 Jan 2019

An Analysis Pipeline for Identification of RNA Modification, Alternative Splicing and Polyadenylation Using Third Generation Sequencing
Yuxiang Liufu ... Lin Wu
BIO-PROTOCOL | VOL. 12
Yuxiang Liufu, et. al.Yuxiang Liufu ... Lin Wu
01 Jan 2021
BIO-PROTOCOL | VOL. 12

Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing.
Liangzhen Zhao ... Anireddy S N Reddy
Frontiers in Genetics | VOL. 10
Liangzhen Zhao, et. al.Liangzhen Zhao ... Anireddy S N Reddy
21 Mar 2019
Frontiers in Genetics | VOL. 10

Plastid Genome Assembly Using Long-read data.
Wenbin Zhou ... Tracey A Ruhlman
Molecular Ecology Resources | VOL. 23
Wenbin Zhou, et. al.Wenbin Zhou ... Tracey A Ruhlman
02 Apr 2023
Molecular Ecology Resources | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology