Abstract
ABSTRACTThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.
Highlights
Since its first release in 2004 and despite significant improvements over the last past decade, the Gallus gallus genome is presently incomplete and highly fragmented (Hillier et al, 2004)
We performed RNA sequencing (RNA-seq) of two independent biological replicates of chick micromass cultures infected for 5 days with empty RCAS-BP (A) replication-competent retroviral particles
While 86.7% of read pairs were mapped against the chicken genome, only 62.2% of read pairs were assigned to gene features (Table 1)
Summary
Since its first release in 2004 and despite significant improvements over the last past decade, the Gallus gallus genome is presently incomplete and highly fragmented (Hillier et al, 2004). The chicken karyotype is composed of 38 autosomal chromosomes (1-38) and two additional sex chromosomes (W, Z) (Bloom et al, 1993). Out of these autosomal chromosomes, 10 are macrochromosomes (1-10), with lengths similar to those in mammals, and 28 are. Chicken microchromosomes display a high recombination rate, contain an elevated number of repetitive elements and are GC-rich, which induces significant bias and sequencing errors when using high-throughput technologies (Chen et al, 2013; Dohm et al, 2008). The fourth version of the Gallus gallus genome (galGal4), released in November 2011, has not fully overcome these issues. The galGal genome sequence has a size of 1.05 Gb
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.