Source Files Research Articles

Research ObjectiveExisting administrative health datasets, such as state‐level all‐payer claims datasets and the Healthcare Cost and Utilization Project (HCUP), are useful resources for researchers but have crucial limitations, such as a lack of national representativeness and the inability to track patients across payers (e.g., Medicare, Medicaid, and commercial) and claim types (inpatient, outpatient, and emergency department) with national data. To overcome some of these limitations and to address important use cases for administrative health data, we developed the Synthetic Healthcare Data for Research (SyH‐DR), a nationally representative, partially synthetic, multi‐payer, administrative health dataset.Study DesignWe drew a representative sample of about 20 million beneficiaries covered by Medicare, Medicaid, and a commercial payer. Our source data included hospital‐based services received by these beneficiaries, as well as filled prescriptions for individuals that received hospital services. After harmonizing the data, we constructed person‐level weights with iterative proportional fitting using control totals from the American Community Survey data for population counts by key demographic domains at geographic granularity and HCUP claims data for claims counts by key demographic group and diagnosis. We then employed machine learning methods to create a synthetic version of this dataset, in order to balance analytic utility with patient privacy and to respect constraints imposed by data use agreements.Population StudiedSyH‐DR is a nationally representative sample of persons who were insured either by a government program (Medicare, Medicaid, or CHIP) or commercial health insurance at any point during 2016.Principal FindingsWe developed a nationally representative, multi‐payer, synthetic claims dataset. The synthetization methodology that we implemented produced synthetic values for claims‐level variables that were similar to the distributions of the variables from the source data. We confirmed that weighted person‐level estimates were similar in the SyH‐DR and benchmark nationally representative surveys, and that distributions of the variables were similar in the de‐identified and source files. The database has a wide variety of uses including tracking patients over time, comparison of demographic and clinical information across granular geographic areas and payers, and analyzing prescription drug and hospital service usage for the same individuals.ConclusionsDatasets developed in recent years have led to advances in researchers' understanding of population health, care experiences, and healthcare costs in the United States, but there does not currently exist a nationally representative all‐payer claims dataset. SyH‐DR complements existing claims datasets by allowing researchers to track patient experiences over claim types using a nationally representative sample of individuals from Medicare, Medicaid, and commercial payers. As the first nationally representative all‐payer claims database, this database will be able to answer many questions regarding public health and healthcare quality which were previously unanswerable or very difficult to answer.Implications for Policy or PracticeA de‐identified version of the SyH‐DR will be made available to health researchers. In addition, SyH‐DR provides a blueprint for how a representative multi‐payer administrative health dataset can be constructed in a way that balances the needs of various stakeholders, including researchers, patients, and data providers.Primary Funding SourceAgency for Healthcare Research and Quality.

Intriduction. To solve the problems of collecting data on sanitary and epidemiological well-being it is necessary to automate and digitalize processes. Analysis of foreign experience shows the feasibility of developing domestic specialized software products that could be more consistent with the tasks of social and hygienic monitoring (SGM) with a comprehensive analysis of health indicators or environmental factors of the population in the context of macroregions over a long period of time. The purpose of the study was to develop a software product for automating the process of combining large amounts of data on environmental factors with the formation of a combined database. Materials and methods. The results of studies of environmental factors carried out by the Russian Federal Service for Surveillance on Consumer Rights Protection and Human Wellbeing (Rospotrebnadzor) within the framework of the SGM for the period from 2007 to 2019 in the context of individual municipalities of the constituent entities of the Russian Federation that are part of the Russian Arctic have been studied. Results. To solve the problem of forming a combined database from separate files in the MS Office Excel format, a software product (SP) in Python 3.6 has been developed that automates the processes of creating a database from a large number of separate files, which are characterized by a common structure. The SP was tested on the example of the analysis of the results of the SGM in the context of municipalities of the subjects of the Russian Arctic for 2019. The approbation showed the correct performance of the program, which was confirmed by the results obtained manually. The average time it took to create a merged database from 60 source files was 7 minutes. Conclusion. The created SP allows you to automatically combine a large number of separate Excel files containing in a standardized form data on the factors of the population’s habitat, collected as part of the SGM maintenance, with the formation of a combined database. The software can be used by the institutions of Rospotrebnadzor in the formation of combined databases in the context of any territories of the constituent entities of the Russian Federation for practical and scientific problems.

Source Files Research Articles

Related Topics

Articles published on Source Files

Hardware-in-the-loop simulator for emulation and active control of chatter

Source files of Python, COMSOL and ANSYS in article: High power and energy density dynamic phase change materials using pressure-enhanced close contact melting

Refactoring for reuse: an empirical study

CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries

Chemical weathering of sediment (CWS): A web-based application for chemical weathering study of rocks and sediment

Extracting information from (La)TeX source files

Estimation of chemical weather indices of sediments using R program

The Teaching Analysis and Design of the Construction of C Language Multi-Document Project Based on Keil

Data modelling approaches to astronomical data: Mapping large spectral line data cubes to dimensional data models

Evaluation of the quality of the care pathway for patients with multiple sclerosis in France: Results of an original study of a cohort of 700 patients

Understanding the deformation gradient in Abaqus and key guidelines for anisotropic hyperelastic user material subroutines (UMATs)

NOMA Combined With RC for Reliable and Secure Transmission in a Delay-Constrained System

Low-cost SARS-CoV-2 vaccine homogenization system for Pfizer-BioNTech covid-19 vials

Creating from Anywhere with an Artist-Driven Approach to Workstation Deployment

An Open-Source Modular Framework for Automated Pipetting and Imaging Applications.

Digital Variance Angiography in Selective Lower Limb Interventions

Software Application Profile: exposomeShiny—a toolbox for exposome data analysis

Using National Synthetic Data to Conduct Health Services Research

Towards the Software Evolution Recovery at the Level of Software Architecture

Issues of creating an information system for analysis of environmental factors in the Russian Arctic

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Source Files Research Articles

Related Topics

Articles published on Source Files

Hardware-in-the-loop simulator for emulation and active control of chatter

Source files of Python, COMSOL and ANSYS in article: High power and energy density dynamic phase change materials using pressure-enhanced close contact melting

Refactoring for reuse: an empirical study

CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries

Chemical weathering of sediment (CWS): A web-based application for chemical weathering study of rocks and sediment

Extracting information from (La)TeX source files

Estimation of chemical weather indices of sediments using R program

The Teaching Analysis and Design of the Construction of C Language Multi-Document Project Based on Keil

Data modelling approaches to astronomical data: Mapping large spectral line data cubes to dimensional data models

Evaluation of the quality of the care pathway for patients with multiple sclerosis in France: Results of an original study of a cohort of 700 patients

Understanding the deformation gradient in Abaqus and key guidelines for anisotropic hyperelastic user material subroutines (UMATs)

NOMA Combined With RC for Reliable and Secure Transmission in a Delay-Constrained System

Low-cost SARS-CoV-2 vaccine homogenization system for Pfizer-BioNTech covid-19 vials

Creating from Anywhere with an Artist-Driven Approach to Workstation Deployment

An Open-Source Modular Framework for Automated Pipetting and Imaging Applications.

Digital Variance Angiography in Selective Lower Limb Interventions

Software Application Profile: exposomeShiny—a toolbox for exposome data analysis

Using National Synthetic Data to Conduct Health Services Research

Towards the Software Evolution Recovery at the Level of Software Architecture

Issues of creating an information system for analysis of environmental factors in the Russian Arctic