Abstract

MotivationReliance on mapping to a single reference haplotype currently limits accurate estimation of allele or haplotype-specific expression using RNA-sequencing, notably in highly polymorphic regions such as the major histocompatibility complex.ResultsWe present AltHapAlignR, a method incorporating alternate reference haplotypes to generate gene- and haplotype-level estimates of transcript abundance for any genomic region where such information is available. We validate using simulated and experimental data to quantify input allelic ratios for major histocompatibility complex haplotypes, demonstrating significantly improved correlation with ground truth estimates of gene counts compared to standard single reference mapping. We apply AltHapAlignR to RNA-seq data from 462 individuals, showing how significant underestimation of expression of the majority of classical human leukocyte antigen genes using conventional mapping can be corrected using AltHapAlignR to allow more accurate quantification of gene expression for individual alleles and haplotypes.Availability and implementationSource code freely available at https://github.com/jknightlab/AltHapAlignR.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • RNA sequencing (RNA-seq) enables high resolution quantification of transcription (Wang et al, 2009)

  • We proceeded to compare our AltHapAlignR approach accounting for alternate haplotypes with a standard mapping procedure using the GRCh38 reference sequence as well as to gene expression estimates produced by Salmon (Patro et al, 2017)

  • We randomly generated simulated reads (50 and 100 bp) from up to two of the eight major histocompatibility complex (MHC) reference haplotypes for each gene, with relative gene expression levels set at five different ratios (1:1, 1:1.125, 1:1.25, 1:1.5 and 1: 2) between any pair of haplotypes

Read more

Summary

Introduction

RNA sequencing (RNA-seq) enables high resolution quantification of transcription (Wang et al, 2009). The development of longer read technologies for high-throughput sequencing will help address this problem but given the large amounts of data that are already generated and will continue to be generated, using shorter reads there is a need for innovative approaches to enable more accurate quantification. This is the case for highly polymorphic regions of the genome where gene and transcript level expression data may be of significant clinical and biological interest such as the major histocompatibility complex (MHC) (Brandt et al, 2015; Lighten et al, 2014). Establishing causal mechanistic relationships between specific variants and expression of individual genes is a current priority in this field of research and accurate quantification of transcription is a critical step in such studies (Knight, 2014)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call