Regression and residual analysis in linear models with interval censored data

Rebekka Topp

doi:10.17877/de290r-390

Abstract

This work consists of two parts, both related with regression analysis for interval censored data. Interval censored data x have the property that their value cannot be observed exactly but only the respective interval [xL,xR] which contains the true value x with probability one. In the first part of this work I develop an estimation theory for the regression parameters of the linear model where both dependent and independent variables are interval censored. In doing so I use a semi-parametric maximum likelihood approach which determines the parameter estimates via maximization of the likelihood function of the data. Since the density function of the covariate is unknown due to interval censoring, the maximization problem is solved through an algorithm which frstly determines the unknown density function of the covariate and then maximizes the complete data likelihood function. The unknown covariate density is hereby determined nonparametrically through a modification of the approach of Turnbull (1976). The resulting parameter estimates are given under the assumption that the distribution of the model errors belong to the exponential familiy or are Weibull. In addition I extend my extimation theory to the case that the regression model includes both an interval censored and an uncensored covariate. Since the derivation of the theoretical statistical properties of the developed parameter estimates is rather complex, simulations were carried out to determine the quality of the estimates. As a result it can be seen that the estimated values for the regression parameters are always very close the real ones. Finally, some alternative estimation methods for this regression problem are discussed. In the second part of this work I develop a residual theory for the linear regression model where the covariate is interval censored, but the depending variable can be observed exactly. In this case the model errors appear to be interval censored, and so the residuals. This leads to the problem of not directly observable residuals which is solved in the following way: Since one assumption of the linear regression model is the N(0,2)-distribution of the model errors, it follows that the distribtuion of the interval censored errors is a truncated normal distribution, the truncation being determined by the observed model error intervals. Consequently, the distribution of the interval censored residuals is a -distribution, truncated in the respective residual interval, where the estimation of the residual variance is accomplished through the method of Gomez et al. (2002). In a simulation study I compare the behaviour of the so constructed residuals with those of Gomez et al. (2002) and a naive type of resiudals which considers the middle of the residual interval as the observed residual. The results show that my residuals can be used for most of the simulated scenarios, wheras this is not the case for the other two types of residuals. Finally, my new residual theory is applied to a data set from a clinical study.

Full Text