Abstract Purpose: To develop a schema for harmonized, detailed characterization of race and ethnicity data for an electronic health record (EHR)-based cohort linked to cancer registry data to assess disparities in hepatocellular carcinoma (HCC) risk. Methods: We assembled an cohort of adults with 1+ encounter in 2000-2017 by pooling EHR data from three healthcare systems (Kaiser Permanente Hawai’i, Sutter Health, San Francisco Health Network). EHR data to define sociodemographic and clinical factors were linked to population-based state cancer registries for data on incident HCC (ICD-03 site/histology C22.0/8170-8175) through 2017. We extracted, harmonized and developed schemas to characterize multiple response race and ethnicity data across systems. We used Cox proportional hazards regression to examine disparities in HCC risk across racial and ethnic groups, comparing observations across categorization schema. Results: The cohort included 4,248,553 adults; 2,916 had incident HCC. A race and ethnicity schema that prioritized small populations and disaggregation of heterogenous groups defined 16 categories and was used to analyze pooled data from Kaiser Permanente Hawai’i and Sutter Health. With this schema, we observed variation in HCC risk across ethnic groups that are typically aggregated into larger ethno-racial categories (e.g., Asian American). Compared to non-Hispanic (NH) White males, Vietnamese American males had greater risk of HCC (hazard ratio (HR): 7.42; 95% CI: 4.25, 12.96) as did American Indian/Alaska Native, Black, Chinese American, Hispanic, Native Hawaiian, and Pacific Islander males; Asian Indian, Filipino, Japanese, and Korean American males did not. Among females, every group except American Indian/Alaska Native, Asian Indian American, and Pacific Islander females had greater risk of HCC than the NH White group. Due to less granular data from San Francisco Health Network during our study period, we developed a second, grouped schema aggregating categories from the detailed schema; it resulted in seven race and ethnicity categories and was used to analyze pooled data from all three healthcare systems. Compared to the categorization of race and ethnicity used by the US Census, which also has seven categories, we observed different patterns in HCC risk disparities for the smallest groups (i.e., American Indian/Alaska Native and Native Hawaiian). Conclusions: Analysis of a large EHR-based cohort linked to cancer registry data using a race and ethnicity characterization schema that prioritizes small populations yields valuable knowledge on disparities in HCC risk. Thus, to advance research on health disparities using EHR data, researchers must critically assess racial and ethnic categories typically available from healthcare system data repositories. We offer recommendations to operationalize EHR data on race and ethnicity to facilitate its meaningful use in research. Citation Format: Mindy C. DeRouen, Caroline A. Thompson, Alison J. Canchola, Alyssa Cortella, Pushkar Inamdar, Janet Chu, Sixiang Nie, Mai Vu, Ma Somsouk, Michele M. Tana, Anna D. Rubinsky, Iona Cheng, Mi-Ok KIm, Mark Segal, Chanda Ho, Yihe G. Daida, Su-Ying Liang, Hashem B. El-Serag, Scarlett L. Gomez, Salma Shriff-Marco. Refined race and ethnicity categories for an EHR-based cohort to study disparities in liver cancer risk [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3943.
Read full abstract