Abstract

Characterization of genetic variations that are associated with gene expression levels is essential to understand cellular mechanisms that underline human complex traits. Expression quantitative trait loci (eQTL) mapping attempts to identify genetic variants, such as single nucleotide polymorphisms (SNPs), that affect the expression of one or more genes. With the availability of a large volume of gene expression data, it is necessary and important to develop fast and efficient statistical and computational methods to perform eQTL mapping for such large scale data. In this paper, we proposed a new method, the low rank penalized regression method (LORSEN), for eQTL mapping. We evaluated and compared the performance of LORSEN with two existing methods for eQTL mapping using extensive simulations as well as real data from the HapMap3 project. Simulation studies showed that our method outperformed two commonly used methods for eQTL mapping, LORS and FastLORS, in many scenarios in terms of area under the curve (AUC). We illustrated the usefulness of our method by applying it to SNP variants data and gene expression levels on four chromosomes from the HapMap3 Project.

Highlights

  • With rapid advancements in sequencing technologies and high-throughput technologies, a large number of single nucleotide polymorphism (SNP) data and gene expression data have become available

  • When the number of samples is much smaller than the number of SNPs and the number of causal SNPs is larger than the number of samples, Higher Criticism (HC)-Screening is seemingly not an appropriate screening tool

  • This is because the number of causal SNPs retained after the HC-Screening is much smaller than the actual number of causal SNPs, resulting in possible power loss in subsequent analysis

Read more

Summary

Introduction

With rapid advancements in sequencing technologies and high-throughput technologies, a large number of single nucleotide polymorphism (SNP) data and gene expression data have become available. Another challenge in eQTL mapping is that the number of SNPs involved is usually very large (Yang et al, 2013) This results in heavy computational burden for estimating model parameters and generally results in reduced detection power if all SNPs are included in eQTL mapping. This is because the signal-tonoise ratio (SNR) is very low, meaning only a very small portion of SNPs that are associated with gene expression levels. A number of methods based on the penalized regression have been developed to model such sparsity of eQTLs (Lee and Xing, 2012; Yang et al, 2013; Cheng et al, 2014; Jeng et al, 2020)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call