A parametric framework for multidimensional linear measurement error regression.

Stanley Luck,Sriparna Saha

doi:10.1371/journal.pone.0262148

Abstract

The ordinary linear regression method is limited to bivariate data because it is based on the Cartesian representation y = f(x). Using the chain rule, we transform the method to the parametric representation (x(t), y(t)) and obtain a linear regression framework in which the weighted average is used as a parameter for a multivariate linear relation for a set of linearly related variable vectors (LRVVs). We confirm the proposed approach by a Monte Carlo simulation, where the minimum coefficient of variation for error (CVE) provides the optimal weights when forming a weighted average of LRVVs. Then, we describe a parametric linear regression (PLR) algorithm in which the Moore-Penrose pseudoinverse is used to estimate measurement error regression (MER) parameters individually for the given variable vectors. We demonstrate that MER parameters from the PLR and nonlinear ODRPACK methods are quite similar for a wide range of reliability ratios, but ODRPACK is formulated only for bivariate data. We identify scale invariant quantities for the PLR and weighted orthogonal regression (WOR) methods and their correspondences with the partitioned residual effects between the variable vectors. Thus, the specification of an error model for the data is essential for MER and we discuss the use of Monte Carlo methods for estimating the distributions and confidence intervals for MER slope and correlation coefficient. We distinguish between elementary covariance for the y = f(x) representation and covariance vector for the (x(t), y(t)) representation. We also discuss the multivariate generalization of the Pearson correlation as the contraction between Cartesian polyad alignment tensors for the LRVVs and weighted average. Finally, we demonstrate the use of multidimensional PLR in estimating the MER parameters for replicate RNA-Seq data and quadratic regression for estimating the parameters of the conical dispersion of read count data about the MER line.

Highlights

In this work, we consider the problem of fitting a multidimensional line for data that are subject to stochastic error
We develop a novel multidimensional linear regression algorithm where the relation between variable vectors is parameterized by a weighted average, and the weights are determined from an error model E for the input data
The implementation of parametric linear regression (PLR) involves the use of weighted least squares normal equations to estimate the parameters of the best fit line for a set of linearly related variable vectors (LRVVs)

Summary

Introduction

We consider the problem of fitting a multidimensional line for data that are subject to stochastic error. The motivation for this work comes from a collaborative R & D effort involving the application of the genome-wide association (GWAS) [1] and eQTL methods [2] to identify beneficial agronomic variation in maize. This led to our applied algebraic investigation of the merits of various effect size measures and their associated statistical methodologies. A parametric framework for multidimensional linear measurement error regression

Objectives

Methods

Findings

Conclusion