Abstract

Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.

Highlights

  • Expression Quantitative Trait Loci analysis [1,2,3] provides important study designs in functional genomics as they enable the characterisation of genetic sequence variants, commonly Single Nucleotide Polymorphisms (SNPs), that associate with mRNA expression levels of individual genes

  • Gene expression data are noisy, both due to the stochastic nature of biological systems and due to technical noise. This inherit noise may invalidate the common assumption of Gaussianity of error terms in, e.g. linear models, which are commonly used in Expression Quantitative Trait Loci (eQTL) analysis

  • This is a major drawback in many eQTL studies, in human studies, which will have to be adjusted for general covariates representing major phenotypes of the subjects, including gender, age, body mass index and batching effects

Read more

Summary

Introduction

Expression Quantitative Trait Loci (eQTL) analysis [1,2,3] provides important study designs in functional genomics as they enable the characterisation of genetic sequence variants, commonly Single Nucleotide Polymorphisms (SNPs), that associate with mRNA expression levels of individual genes. EQTL analysis has been applied in a number of organisms, including human [4, 5], mice [6, 7] and rats [8, 9], and has revealed that a substantial proportion of mRNA expression levels are influenced by genetic variation. It is often relevant to adjust the model for known covariates, for example, gender, age, body mass index, disease status and batch effects [10], in epidemiological and human studies. This can be performed in a linear model. We present results from the comparative study between the standard and the robust model based on two real eQTL data sets

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.