Statistical analysis of cost data is often difficult because of highly skewed data resulting from a few patients who incur high costs relative to the majority of patients. When the objective is to predict the cost for an individual patient, the literature suggests that one should choose a regression model based on the quality of its predictions. In exploring the econometric issues, the objective of this study was to estimate a cost function in order to estimate the annual health care cost of dementia. Using different models, health care costs were regressed on the degree of dementia, sex, age, marital status and presence of any co-morbidity other than dementia. Models with a log-transformed dependent variable, where predicted health care costs were re-transformed to the unlogged original scale by multiplying the exponential of the expected response on the log-scale with the average of the exponentiated residuals, were part of the considered models. The root mean square error (RMSE), the mean absolute error (MAE) and the Theil U-statistic criteria were used to assess which model best predicted the health care cost. Large values on each criterion indicate that the model performs poorly. Based on these criteria, a two-part model was chosen. In this model, the probability of incurring any costs was estimated using a logistic regression, while the level of the costs was estimated in the second part of the model. The choice of model had a substantial impact on the predicted health care costs, e.g. for a mildly demented patient, the estimated annual health care costs varied from DKK 71 273 to DKK 90 940 (US$ 1 = DKK 7) depending on which model was chosen. For the two-part model, the estimated health care costs ranged from DKK 44714, for a very mildly demented patient, to DKK 197 840, for a severely demented patient.
Read full abstract