Abstract

Next generation sequencing protocols such as RNA-seq have made the genome wide characterization of the transcriptome a crucial part of many research projects in biology. Analyses of the resulting data provide key information on gene expression and in certain cases on exon or isoform usage. The emergence of transcript quantification software such as Salmon has enabled researchers to efficiently estimate isoform and gene expressions across the genome while tremendously reducing the necessary computational power. Although overall gene expression estimations were shown to be accurate, isoform expression quantifications appear to be a more challenging task. Low expression levels and uneven or insufficient coverage were reported as potential explanations for inconsistent estimates. Here, through the example of the ketohexokinase (Khk) gene in mouse, we demonstrate that the use of an incorrect gene annotation can also result in erroneous isoform quantification results. Manual correction of the input Khk gene model provided a much more accurate estimation of relative Khk isoform expression when compared to quantitative PCR (qPCR measurements). In particular, removal of an unexpressed retained intron and a proper adjustment of the 5’ and 3’ untranslated regions both had a strong impact on the correction of erroneous estimates. Finally, we observed a better concordance in isoform quantification between datasets and sequencing strategies when relying on the newly generated Khk annotations. These results highlight the importance of accurate gene models and annotations for correct isoform quantification and reassert the need for orthogonal methods of estimation of isoform expression to confirm important findings.

Highlights

  • Accurate measurement of mRNA expression levels is a crucial component in many modern biological studies

  • Using DRIMSeq proportion estimations, we show that quantification of these isoforms as output by Salmon is biased by the presence of an annotated retained intron that is expressed at very low levels

  • Using traditional count-based methods, we confirmed the tissue-specificity of Khk expression and identified exons preferentially retained in some tissues

Read more

Summary

Introduction

Accurate measurement of mRNA expression levels is a crucial component in many modern biological studies. The emergence of Generation Sequencing (NGS) based protocols such as RNA-seq has overcome this limitation and enabled researchers to profile mRNA expression at the genome wide level[1,2,3]. While such experiments are routinely performed, the subsequent bioinformatics analysis and data interpretation still pose computational challenges. Low number of replicates per condition together with a high dynamic range in expression levels across the genome require appropriate statistical frameworks[4,5,6,7,8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call