CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Gabrielle Deschamps-Francoeur,Sherif Abou Elela,Michelle S Scott,Vincent Boivin,Bonnie Berger

doi:10.1093/bioinformatics/btz433

Gabrielle Deschamps-Francoeur, Sherif Abou Elela + Show 3 more

Open Access

https://doi.org/10.1093/bioinformatics/btz433

Copy DOI

Abstract

MotivationNext-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage.ResultsHere we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons.Availability and implementationThe CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Detection and quantification of RNA transcripts is a critical step to understand the mechanism of gene expression and its impact on cell function
We have developed the Count Corrector (CoCo) package, which consists of three main modules: 1) the correct_annotation module which generates gapped annotation files in which the regions of the host gene transcript features overlapping with nested genes are precisely removed (Fig. 1B), 2) the correct_count module which recuperates the reads associated with nested and multimapped genes using the modified annotation (Fig. 1D and E), and 3) the correct_bedgraph annotation which produces accurate representations of paired-end reads (Supplementary Fig. 2)
To test the quantification accuracy of the CoCo pipeline, we examined its capacity to correctly assign and quantify sequencing reads using four RNA-sequencing techniques (RNA-seq) datasets, and compared its quantification to those of the main read assignment pipelines available

Summary

Introduction

Detection and quantification of RNA transcripts is a critical step to understand the mechanism of gene expression and its impact on cell function. Diverse library preparation protocols exist, the most commonly used ones focusing on particular classes of RNA through enrichment steps Such strategies include polyA enrichment, non-rRNA enrichment (e.g. rRNA depletion), small RNA enrichment and enrichment for RNAs bound to specific factors (Conesa, et al, 2016; Hrdlickova, et al, 2017; O'Neil, et al, 2013). In the case of the CH507-513H4.1 locus which hosts miRNAs miR-3648 and miR-3687, the reads were originally attributed to the miRNA despite the absence of corresponding peaks in the bedgraph This inappropriate assignment is no longer observed following background correction (Supplementary Fig. 9)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer applications in the biosciences : CABIOS	Publication Date: May 29, 2019
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS

Lead the way for us

Similar Papers

The Human Mitochondrial Transcriptome
Tim R Mercer ... John S Mattick
Cell | VOL. 146
Tim R Mercer, et. al.Tim R Mercer ... John S Mattick
01 Aug 2011
Cell | VOL. 146

Open reading frame dominance indicates protein-coding potential of RNAs.
Yusuke Suenaga ... Hiroyuki Kogashi
EMBO Reports | VOL. 23
Yusuke Suenaga, et. al.Yusuke Suenaga ... Hiroyuki Kogashi
19 Apr 2022
EMBO Reports | VOL. 23

Identification of novel mRNAs and lncRNAs associated with mouse experimental colitis and human inflammatory bowel disease.
Carl Robert Rankin ... Charalabos Pothoulakis
American journal of physiology. Gastrointestinal and liver physiology | VOL. 315
Carl Robert Rankin, et. al.Carl Robert Rankin ... Charalabos Pothoulakis
28 Jun 2018
American journal of physiology. Gastrointestinal and liver physiology | VOL. 315

LuluDB-The Database Created Based on Small RNA, Transcriptome, and Degradome Sequencing Shows the Wide Landscape of Non-coding and Coding RNA in Yellow Lupine (Lupinus luteus L.) Flowers and Pods.
Paulina Glazinska ... Marta Wysocka
Frontiers in Genetics | VOL. 11
Paulina Glazinska, et. al.Paulina Glazinska ... Marta Wysocka
15 May 2020
Frontiers in Genetics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS