Abstract
BackgroundMissing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily.Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.ResultsExtra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively.ConclusionsArbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.
Highlights
Missing data is a common statistical problem in healthcare datasets from populations of older people
Missing data are said to be ignorable if the parameters that are used to model the missing data process are not related to the parameters used to model the observed data while non-ignorability exists if there is a systematic difference between responders and nonresponders even after accounting for all the observed data [7, 9]
The most appropriate method of dealing with this amount of missingness had to be determined [19, 49]. The results of this analysis have shown that, in determining the methods to deal with missing data, using extra information gathered during the data collection exercise about the cause of missingness, rather than the arbitrary selection of such methods, is more appropriate
Summary
Missing data is a common statistical problem in healthcare datasets from populations of older people. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Croninger and Douglas [7] indicate that the choice of method used for coping with missing data is not crucial if there is not much missing data and/or the sample is big. This is because most methods will yield similar results in such circumstances. When data are MNAR, panel selection models, including the Heckman, and pattern-mixture approaches can be used [15,16,17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.