BackgroundHealth services research often relies on secondary data, necessitating quality checks for completeness, validity, and potential errors before use. Various methods address implausible data, including data elimination, statistical estimation, or value substitution from the same or another dataset. This study presents an internal validation process of a secondary dataset used to investigate hospital compliance with minimum caseload requirements (MCR) in Germany. The secondary data source validated is the German Hospital Quality Reports (GHQR), an official dataset containing structured self-reported data from all hospitals in Germany.MethodsThis study conducted an internal cross-field validation of MCR-related data in GHQR from 2016 to 2021. The validation process checked the validity of reported MCR caseloads, including data availability and consistency, by comparing the stated MCR caseload with further variables in the GHQR. Subsequently, implausible MCR caseload values were corrected using the most plausible values given in the same GHQR. The study also analysed the error sources and used reimbursement-related Diagnosis Related Groups Statistic data to assess the validation outcomes.ResultsThe analysis focused on four MCR procedures. 11.8–27.7% of the total MCR caseload values in the GHQR appeared ambiguous, and 7.9–23.7% were corrected. The correction added 0.7–3.7% of cases not previously stated as MCR caseloads and added 1.5–26.1% of hospital sites as MCR performing hospitals not previously stated in the GHQR. The main error source was this non-reporting of MCR caseloads, especially by hospitals with low case numbers. The basic plausibility control implemented by the Federal Joint Committee since 2018 has improved the MCR-related data quality over time.ConclusionsThis study employed a comprehensive approach to dataset internal validation that encompassed: (1) hospital association level data, (2) hospital site level data and (3) medical department level data, (4) report data spanning six years, and (5) logical plausibility checks. To ensure data completeness, we selected the most plausible values without eliminating incomplete or implausible data. For future practice, we recommend a validation process when using GHQR as a data source for MCR-related research. Additionally, an adapted plausibility control could help to improve the quality of MCR documentation.
Read full abstract