Background and Aim: With the existence of research over federated repositories, it is desirable to utilize high quality integrated data repositories (IDRs). IDR can be defined as a data warehouse optimized for research purposes rather than clinical care, which contains clinical, administrative, trial, and -omics data. In this work, we focus on the quality of clinical and administrative components of IDRs. There is no standardized methodology which could quantitatively evaluate the quality of an IDR (e.g., ‘Does a given IDR have at least 2000 adult diabetic patients (type 1) with complete pediatric history?’). With the increased interest in research of existing data (such as comparative effectiveness research) and increasing number of institutions with an comprehensive IDRs, it is important to have a mechanism for selecting quality IDRs. Methods: Our poster will present a set of IDR quality measures which can compare IDRs in size and completeness. We considered the following criteria for a good measure: the measure is intuitive to interpret; facilitates monitoring improvement; and does not place any arbitrary value on individual measure components. Our methodology proposes a hierarchy of definitions of minimum EHR elements and uses a simple count of each level to quantitatively evaluate an IDR (e.g., count of patients with at least one diagnosis and one laboratory result). Results: We have applied our methodology to an IDR at Marshfield Clinic. Our poster will list all measures and results. Selected results were: 1.7M unique patients (level G1), 0.4M patients with at least one diagnosis, lab and prescription (level D3). To facilitate evaluations at other institutions, we have created an ANSI-SQL script which can compute all measures in a single execution. Conclusion: Our evaluation methodology provides a quick way to compare IDRs at different institutions. It can be applied to institutions contributing to a virtual warehouse. Our goal was to arrive at a pragmatic set of measures operating on an easy-to-implement event schema. The limitations are: focus on general research idea and event types and criteria included in some level definitions. We plan to conduct a Delphi study involving informatics experts to arrive at an improved consensus set of measures.
Read full abstract