Abstract
Introduction: Regression models for high-dimensional multivariate data curated from high throughput biological assays in omics, brain networks, medical imaging, and psychometric instruments contain network features. Multivariate linear regression is a standard model that fits these data as response variables and the participant characteristics as explanatory variables. More often, the number of variates of the response variables is larger than the number of observations ( ). To solve these problems, a structured covariance model is necessary to maintain the network feature of the response data, and sparsity induction will be advancing to reduce the number of unknown parameters in the large variance-covariance matrix. Method: This study investigated an approach to solving multivariate linear regression for multivariate-normal distributed response variables using a sparsity-induced latent precision matrix. The multivariate linear regression coefficients were derived from an algorithm that estimated the precision matrix as a plug-in parameter using different Gaussian Graphical Models. The developed Bioconductor tool “sparsenetgls” based on this algorithm was applied to case studies of real omics datasets. Data simulations were also used to compare different Gaussian Graphical Models estimation methods in multivariate linear regression. Results: The GGM multivariate linear regression (GGM-MLS) advances the multivariate regression. In the scenario when the number of observations is smaller than the number of response variates ( ), GGM-MLS tackles this challenge using sparsity induction in the covariance matrix. Analytical proof suggests that the estimation of the response variable's precision matrix and the regression coefficient of GGM-MLS are two independent processes. Simulation studies and case studies also consistently suggested that the regression coefficient estimates of GGM-MLS are similar to the estimates using linear mixed regression with only the variance terms in the covariance matrix. Furthermore, GGM-MLS method reduces the variance (standard errors) of the regression coefficients in both and scenarios. Keywords: GGM in multivariate linear regression, network outcome responses, omics data analysis, sparsity induction in multivariate linear regression
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have