Abstract

Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because of the risk of disclosure, ages of very old respondents can often not be released; in particular, this is a specific stipulation of the Health Insurance Portability and Accountability Act (HIPAA) for the release of health data for individuals. Top-coding of individuals beyond a certain age is a standard way of dealing with this issue, and it may be adequate for cross-sectional data, when a modest number of cases are affected. However, this approach leads to serious loss of information in longitudinal studies when individuals have been followed for many years. We propose and evaluate an alternative to top-coding for this situation based on multiple imputation (MI). This MI method is applied to a survival analysis of simulated data, and data from the Charleston Heart Study (CHS), and is shown to work well in preserving the relationship between hazard and covariates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.