Abstract

BackgroundAllelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists.ResultsWe propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present.Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios.ConclusionsWe introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0479-2) contains supplementary material, which is available to authorized users.

Highlights

  • Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation

  • While RNAseq is typically used for transcript-centric analysis, where differential expression of genes or transcripts is tested between treatments or tissues [2], recently, RNAseq has been increasingly utilized for nucleotide-centric inferences such as, for coding SNP discovery [3], for cSNP genotyping to estimate population parameters [4] or for allelic specific expression [5,6]

  • Some algorithms and models have been tailored to perform this inference using RNAseq data [8,9,10,11], but most of them require prior ascertainment of cSNP genotypes to extract read counts for heterozygous sites or they require RNAseq or genomic sequence on parents of the individuals used for ASE testing to reliably infer cSNP genotypes

Read more

Summary

Introduction

Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals This requires prior ascertainment of the cSNP genotypes for all individuals. Most models do not include biological replication and assume either a single replicate or treat all biological replicates alike and collapse counts down to the nucleotide level These assumptions may not be too restrictive in F1 crosses of inbred strains of individuals of model organisms [12] for which exhaustive sequence resources are available and biological variation is minimal, but they become more problematic for outbred populations and their crosses [13] and even for crosses of inbred lines when the purpose is to focus on individual variation in ASE for breeding [14] or population genetics inferences [15]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call