A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model.

Mickael Orgeur,Stefan T Börno,Sigmar Stricker,Delphine Duprez,Bernd Timmermann,Marvin Martens

doi:10.1242/bio.028498

Abstract

ABSTRACTThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

Highlights

Since its first release in 2004 and despite significant improvements over the last past decade, the Gallus gallus genome is presently incomplete and highly fragmented (Hillier et al, 2004)
We performed RNA sequencing (RNA-seq) of two independent biological replicates of chick micromass cultures infected for 5 days with empty RCAS-BP (A) replication-competent retroviral particles
While 86.7% of read pairs were mapped against the chicken genome, only 62.2% of read pairs were assigned to gene features (Table 1)

Summary

Introduction

Since its first release in 2004 and despite significant improvements over the last past decade, the Gallus gallus genome is presently incomplete and highly fragmented (Hillier et al, 2004). The chicken karyotype is composed of 38 autosomal chromosomes (1-38) and two additional sex chromosomes (W, Z) (Bloom et al, 1993). Out of these autosomal chromosomes, 10 are macrochromosomes (1-10), with lengths similar to those in mammals, and 28 are. Chicken microchromosomes display a high recombination rate, contain an elevated number of repetitive elements and are GC-rich, which induces significant bias and sequencing errors when using high-throughput technologies (Chen et al, 2013; Dohm et al, 2008). The fourth version of the Gallus gallus genome (galGal4), released in November 2011, has not fully overcome these issues. The galGal genome sequence has a size of 1.05 Gb

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biology Open	Publication Date: Jan 1, 2017
Citations: 7	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Open

Lead the way for us

Similar Papers

TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data.
Cyril Kurylo ... Sarah Djebali
NAR genomics and bioinformatics | VOL. 5
Cyril Kurylo, et. al.Cyril Kurylo ... Sarah Djebali
11 Oct 2023
NAR genomics and bioinformatics | VOL. 5

Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.
Wei Zhang ... Hui Zheng
PLOS Computational Biology | VOL. 11
Wei Zhang, et. al.Wei Zhang ... Hui Zheng
23 Dec 2015
PLOS Computational Biology | VOL. 11

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.
Soohyun Lee ... Sanghyuk Lee
BMC Bioinformatics | VOL. 16
Soohyun Lee, et. al.Soohyun Lee ... Sanghyuk Lee
03 Sep 2015
BMC Bioinformatics | VOL. 16

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.
Yi Li ... Xiaohui Xie
BMC Bioinformatics | VOL. Suppl 14 5
Yi Li, et. al.Yi Li ... Xiaohui Xie
01 Apr 2013
BMC Bioinformatics | VOL. Suppl 14 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Open