Supervised structural learning of semiparametric regression on high-dimensional correlated covariates with applications to eQTL studies.

Wei Liu,Ying Wei,Yanyuan Ma,Yi Li,Li Liu,Huazhen Lin

doi:10.1002/sim.9769

Abstract

Expression quantitative trait loci (eQTL) studies utilize regression models to explain the variance of gene expressions with genetic loci or single nucleotide polymorphisms (SNPs). However, regression models for eQTL are challenged by the presence of high dimensional non-sparse and correlated SNPs with small effects, and nonlinear relationships between responses and SNPs. Principal component analyses are commonly conducted for dimension reduction without considering responses. Because of that, this non-supervised learning method often does not work well when the focus is on discovery of the response-covariate relationship. We propose a new supervised structural dimensional reduction method for semiparametric regression models with high dimensional and correlated covariates; we extract low-dimensional latent features from a vast number of correlated SNPs while accounting for their relationships, possibly nonlinear, with gene expressions. Our model identifies important SNPs associated with gene expressions and estimates the association parameters via a likelihood-based algorithm. A GTEx data application on a cancer related gene is presented with 18 novel eQTLs detected by our method. In addition, extensive simulations show that our method outperforms the other competing methods in bias, efficiency, and computational cost.

Full Text