The impact of standardizing the definition of visits on the consistency of multi-database observational health research.

Erica A Voss,Qianli Ma,Patrick B Ryan

doi:10.1186/s12874-015-0001-6

Erica A Voss, Qianli Ma + Show 1 more

Open Access

https://doi.org/10.1186/s12874-015-0001-6

Copy DOI

Journal: BMC medical research methodology	Publication Date: Mar 8, 2015
Citations: 32	License type: CC BY 4.0

Affiliation: Janssen (United States)

Abstract

BackgroundUse of administrative claims from multiple sources for research purposes is challenged by the lack of consistency in the structure of the underlying data and definition of data across claims data providers. This paper evaluates the impact of applying a standardized revenue code-based logic for defining inpatient encounters across two different claims databases.MethodsWe selected members who had complete enrollment in 2012 from the Truven MarketScan Commercial Claims and Encounters (CCAE) and the Optum Clinformatics (Optum) databases. The overall prevalence of inpatient conditions in the raw data was compared to that in the common data model (CDM) with the standardized visit definition applied.ResultsIn CCAE, 87.18% of claims from 2012 that were classified as part of inpatient visits in the raw data were also classified as part of inpatient visits after the data were standardized to CDM, and this overlap was consistent from 2006 to 2011. In contrast, Optum had 83.18% concordance in classification of 2012 claims from inpatient encounters before and after standardization, but the consistency varied over time. The re-classification of inpatient encounters substantially impacted the observed prevalence of medical conditions occurring in the inpatient setting and the consistency in prevalence estimates between the databases. On average, before standardization, each condition in Optum was 12% more prevalent than that same condition in CCAE; after standardization, the prevalence of conditions had a mean difference of only 1% between databases. Amongst 7,039 conditions reviewed, the difference in the prevalence of 67% of conditions in these two databases was reduced after standardization.ConclusionsIn an effort to improve consistency in research results across database one should review sources of database heterogeneity, such as the way data holders process raw claims data. Our study showed that applying the Observational Medical Outcomes Partnership (OMOP) CDM with a standardized approach for defining inpatient visits during the extract, transfer, and load process can decrease the heterogeneity observed in disease prevalence estimates across two different claims data sources.Electronic supplementary materialThe online version of this article (doi:10.1186/s12874-015-0001-6) contains supplementary material, which is available to authorized users.

Highlights

Use of administrative claims from multiple sources for research purposes is challenged by the lack of consistency in the structure of the underlying data and definition of data across claims data providers
We found that Commercial Claims and Encounters (CCAE) followed the same inpatient classification pattern exhibited in 2012 among years from 2006 to 2011, while Optum Clinformatics (Optum) varied in this classification
Amongst 7,039 conditions reviewed, the difference in the prevalence of 67% of conditions in these two databases was reduced after standardization, while 23% of condition prevalence estimates did not change after standardization

Summary

Introduction

Use of administrative claims from multiple sources for research purposes is challenged by the lack of consistency in the structure of the underlying data and definition of data across claims data providers. Claims data are maintained and used for research through various institutions, including government agencies (e.g. Centers for Medicare and Medicaid Services Research Data Assistance Center [CMS ResDAC]), large payers with affiliated research arms (e.g. HealthCore, Optum), or claims processors who aggregate and license data (e.g. IMS, Truven). In the Optum Clinformatics database [4], all medical service claims are maintained in a single data table which contains a field to indicate claims associated with an inpatient confinement. In both cases, the choice of data structure and definition of inpatient classification could be considered reasonable approaches in preparing the data for research purposes. It is unknown whether the inconsistency in inpatient definition when taken across data vendors can have a material impact on research findings or negatively impact the ability to conduct cross-database comparisons of analysis results

Methods

Results

Discussion

Conclusion