Survival analysis using neural network hazard model with incomplete covariate data

Yi Yu,Yong Sun,Lin Ma,Yuan-Tong Gu

doi:10.1109/icqr2mse.2011.5976602

Abstract

This paper presents a new procedure to perform survival analysis when some covariate data are not available. A neural network hazard model is utilized here to model the relationship between covariates and the hazard. In order to consider incomplete covariates, the hidden layer target data are represented to be binary random variables. This will enable the training of the two-layer neural network hazard model to be decomposed into training of two single-layer structures. The training of input-hidden structure now becomes the logistic estimation problem with part of the input and all the output (the hidden layer target) missing. However, there are two major problems for this logistic estimation. It requires assumption about the distribution of the partially observed covariates. In addition, estimation for the logistic function will become complicated when the input data has missing values. Therefore, Instead of logistic function, the general location model is adopted to represent the mixed data set which involves missing values. The training of input-hidden structure thus becomes maximisation of the likelihood of mixed continuous data (covariates) and categorical data (hidden layer targets) within the general location model. The hidden layer targets link the two single structures and are updated iteratively. After each update, the expected values of the hidden layer targets are then used for the training of hidden-output structure of the neural network hazard model. This structure is now same as a generalised linear model (GLM) and is trained by the iteratively reweighted least squares (IRLS) approach. The training for both input-hidden and hidden-output structures will iterate until the estimation is converged. This new approach is applied to a group of bearing data. Parts of the data are deleted deliberately to create different realisations of incomplete covariate set. The numerical study demonstrates that this new approach is capable of handling the incomplete covariate data in the survival analysis and its results outperform those of conventional incomplete covariates handling approaches.

Full Text