Healthcare Research and Analytics Data Infrastructure Solution: A Data Warehouse for Health Services Research.

Bunyamin Ozaydin,Sue S Feldman,Ferhat Zengul,Nurettin Oner

doi:10.2196/18579

Bunyamin Ozaydin, Sue S Feldman + Show 2 more

Open Access

https://doi.org/10.2196/18579

Copy DOI

Abstract

BackgroundHealth services researchers spend a substantial amount of time performing integration, cleansing, interpretation, and aggregation of raw data from multiple public or private data sources. Often, each researcher (or someone in their team) duplicates this effort for their own project, facing the same challenges and experiencing the same pitfalls discovered by those before them.ObjectiveThis paper described a design process for creating a data warehouse that includes the most frequently used databases in health services research.MethodsThe design is based on a conceptual iterative process model framework that utilizes the sociotechnical systems theory approach and includes the capacity for subsequent updates of the existing data sources and the addition of new ones. We introduce the theory and the framework and then explain how they are used to inform the methodology of this study.ResultsThe application of the iterative process model to the design research process of problem identification and solution design for the Healthcare Research and Analytics Data Infrastructure Solution (HRADIS) is described. Each phase of the iterative model produced end products to inform the implementation of HRADIS. The analysis phase produced the problem statement and requirements documents. The projection phase produced a list of tasks and goals for the ideal system. Finally, the synthesis phase provided the process for a plan to implement HRADIS. HRADIS structures and integrates data dictionaries provided by the data sources, allowing the creation of dimensions and measures for a multidimensional business intelligence system. We discuss how HRADIS is complemented with a set of data mining, analytics, and visualization tools to enable researchers to more efficiently apply multiple methods to a given research project. HRADIS also includes a built-in security and account management framework for data governance purposes to ensure customized authorization depending on user roles and parts of the data the roles are authorized to access.ConclusionsTo address existing inefficiencies during the obtaining, extracting, preprocessing, cleansing, and filtering stages of data processing in health services research, we envision HRADIS as a full-service data warehouse integrating frequently used data sources, processes, and methods along with a variety of data analytics and visualization tools. This paper presents the application of the iterative process model to build such a solution. It also includes a discussion on several prominent issues, lessons learned, reflections and recommendations, and future considerations, as this model was applied.

Highlights

There are a variety of data sources most frequently used for health services research, a multidisciplinary research field that investigates the implications of factors such as social determinants, organizational structures and processes, technologies, financing and reimbursement, individual choices and behaviors on the access and quality of health care delivery, and overall health and well-being of individuals [1]
This section describes the application of the iterative process model to the design research process of problem identification and solution design for Healthcare Research and Analytics Data Infrastructure Solution (HRADIS)
As part of the Research step, we downloaded raw data files and data layout and/or data dictionary files for all available data releases from the following data sources that are most frequently used by health services researchers: Centers for Medicare and Medicaid Services (CMS) Medicare cost reports (MCR), impact/final rule files, HCAHPS, the area health resources files, American Hospital Association (AHA) annual survey and information technology (IT) supplement, Dartmouth Atlas, and Bureau of Labor Statistics (BLS)

Summary

Introduction

There are a variety of data sources most frequently used for health services research, a multidisciplinary research field that investigates the implications of factors such as social determinants, organizational structures and processes, technologies, financing and reimbursement, individual choices and behaviors on the access and quality of health care delivery, and overall health and well-being of individuals [1]. Without the needed information technology (IT) infrastructure, analytics, and data visualization tools, the potential of the ever-growing health-related big data accumulated in these disparate datasets would still be untapped [3]. Health services researchers spend a substantial amount of time performing integration, cleansing, interpretation, and aggregation of raw data from multiple public or private data sources. Objective: This paper described a design process for creating a data warehouse that includes the most frequently used databases in health services research. Conclusions: To address existing inefficiencies during the obtaining, extracting, preprocessing, cleansing, and filtering stages of data processing in health services research, we envision HRADIS as a full-service data warehouse integrating frequently used data sources, processes, and methods along with a variety of data analytics and visualization tools. There are other data warehouse systems outside of individual institutions, such as systems used for public health purposes [5,6]

Objectives

Methods

Results

Discussion

Conclusion