DataGauge: A Practical Process for Systematically Designing and Implementing Quality Assessments of Repurposed Clinical Data.

Elmer V. Bernstam,Mohammad H. Rahbar,Todd R. Johnson,Kevin O. Hwang,MinJae Lee,Jose-Franck Diaz-Garelli

doi:10.5334/egems.286

Abstract

The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ assessments to evaluate repurposed data for a specific secondary use. DataGauge is composed of five steps: (1) Define information needs, (2) Develop a formal Data Needs Model (DNM), (3) Use the DNM and DQ theory to develop goal-specific DQ assessment requirements, (4) Extract DNM-specified data, and (5) Evaluate according to DQ requirements. DataGauge’s main contribution is integrating general DQ theory and DQ assessment methods into a systematic process. This process supports the integration and practical implementation of existing Electronic Health Record-specific DQ assessment guidelines. DataGauge also provides an initial theory-based guidance framework that ties the DNM to DQ testing methods for each DQ dimension to aid the design of DQ assessments. This framework can be augmented with existing DQ guidelines to enable systematic assessment. DataGauge sets the stage for future systematic DQ assessment research by defining an assessment process, capable of adapting to a broad range of clinical datasets and secondary uses. Defining DataGauge sets the stage for new research directions such as DQ theory integration, DQ requirements portability research, DQ assessment tool development and DQ assessment tool usability.

Highlights

There is growing interest in the reuse of routinely-collected clinical data for comparative effectiveness research, patientcentered outcomes research and clinical quality improvement [1]
The DataGauge Process Overview and Example DataGauge proposes that the three stages of Quality assessment (QA) be completed by iteratively executing five concrete steps: (1) Define information needs based on the analysis question and analytical methods, (2) Develop a data needs model (DNM) that formalizes the data needs, (3) Develop analysis-specific data quality (DQ) requirements based on the analytical purpose, the Data Needs Model (DNM) and the dimensions of DQ, (4) Extract data from the source data set to fit the DNM, and (5) Evaluate the extract according to the DQ requirements and flag all data that infringe on the DQ assessment standard
To enable DQ portability, we propose the definition of a formal, machine-readable DQ requirement encoding standard [70] and testing the viability of DQ requirement portability across repurposed clinical datasets and secondary purposes

Summary

Introduction

There is growing interest in the reuse of routinely-collected clinical data for comparative effectiveness research, patientcentered outcomes research and clinical quality improvement [1]. Tool-driven methods and focus on directly detecting data flaws such as inaccuracies rather than supporting assessment design to evaluate ‘fitness for purpose’ [9, 10] These methods fail to enable systematic assessments by failing to provide a fixed, reproducible sequence of steps. DQ guidelines and theory-based methods define approaches to assess data adherence to more abstract DQ concepts [14, 15] Though these methods achieve much higher potential for broad applicability and generalizability, they usually fail to provide explicit implementation guidance to design and execute DQ assessments [10, 15]. We discuss DataGauge’s ability to integrate past DQ work in the field, its contributions and future research work that would further enable the reliable reuse of clinical data through the development of DQ assessment pipelines

Findings

Background

Discussion