Research ObjectiveExisting administrative health datasets, such as state‐level all‐payer claims datasets and the Healthcare Cost and Utilization Project (HCUP), are useful resources for researchers but have crucial limitations, such as a lack of national representativeness and the inability to track patients across payers (e.g., Medicare, Medicaid, and commercial) and claim types (inpatient, outpatient, and emergency department) with national data. To overcome some of these limitations and to address important use cases for administrative health data, we developed the Synthetic Healthcare Data for Research (SyH‐DR), a nationally representative, partially synthetic, multi‐payer, administrative health dataset.Study DesignWe drew a representative sample of about 20 million beneficiaries covered by Medicare, Medicaid, and a commercial payer. Our source data included hospital‐based services received by these beneficiaries, as well as filled prescriptions for individuals that received hospital services. After harmonizing the data, we constructed person‐level weights with iterative proportional fitting using control totals from the American Community Survey data for population counts by key demographic domains at geographic granularity and HCUP claims data for claims counts by key demographic group and diagnosis. We then employed machine learning methods to create a synthetic version of this dataset, in order to balance analytic utility with patient privacy and to respect constraints imposed by data use agreements.Population StudiedSyH‐DR is a nationally representative sample of persons who were insured either by a government program (Medicare, Medicaid, or CHIP) or commercial health insurance at any point during 2016.Principal FindingsWe developed a nationally representative, multi‐payer, synthetic claims dataset. The synthetization methodology that we implemented produced synthetic values for claims‐level variables that were similar to the distributions of the variables from the source data. We confirmed that weighted person‐level estimates were similar in the SyH‐DR and benchmark nationally representative surveys, and that distributions of the variables were similar in the de‐identified and source files. The database has a wide variety of uses including tracking patients over time, comparison of demographic and clinical information across granular geographic areas and payers, and analyzing prescription drug and hospital service usage for the same individuals.ConclusionsDatasets developed in recent years have led to advances in researchers' understanding of population health, care experiences, and healthcare costs in the United States, but there does not currently exist a nationally representative all‐payer claims dataset. SyH‐DR complements existing claims datasets by allowing researchers to track patient experiences over claim types using a nationally representative sample of individuals from Medicare, Medicaid, and commercial payers. As the first nationally representative all‐payer claims database, this database will be able to answer many questions regarding public health and healthcare quality which were previously unanswerable or very difficult to answer.Implications for Policy or PracticeA de‐identified version of the SyH‐DR will be made available to health researchers. In addition, SyH‐DR provides a blueprint for how a representative multi‐payer administrative health dataset can be constructed in a way that balances the needs of various stakeholders, including researchers, patients, and data providers.Primary Funding SourceAgency for Healthcare Research and Quality.