Abstract Background: Cancer patients who participate in clinical trials have access to novel therapies, increased monitoring, and potentially improved survival. However, this access is not universal. Striking disparities in clinical trial enrollment have been linked to race, sex, and geography. However, studies on this topic are limited due to a lack of patient-specific socioeconomic data. Methods: This analysis used a novel linkage of administrative patient data from University Hospitals Seidman Cancer Center (Cleveland, Ohio) linked to the LexisNexis Socioeconomic Health Attributes, a robust dataset containing 442 patient-specific attributes gathered from public records regarding income, education, housing stability, property ownership and value, social support, and many others. We included cancer patients aged 18-75 years at the time of the latest cancer diagnosis (before trial enrollment if patient was enrolled) during 2007-2022. Because of the substantial differences in the demographic and clinical characteristics between the trial and non-trial patients, we applied the propensity score matching technique to one-to-one match all trial patients to non-trial patients by age, sex, race/ethnicity, cancer type, and year of the latest cancer diagnosis. Next, we applied the classification and regression tree (CART) machine learning algorithm to identify “phenotypes”, or combinations of patient socioeconomic characteristics that predict their trial enrollment status. We additionally performed the random forest algorithm to determine whether our CART model captured the most important variables in predicting trial enrollment. Results: We identified 28,671 cancer patients with a mean and median age of 60 and 62 years. Of those, 2,479 patients were enrolled in trials, and a corresponding 2,479 non-trial patients were matched. We identified 9 phenotypes of trial enrollment from the results of CART. The percentage of patients enrolled in trials ranges from 30% to 80%. Patients belonging to the highest-enrollment phenotype were identified by their estimated annual income exceeding $50,000, higher fraud-risk level, and age under 61 years. Patients falling into the lowest-enrollment phenotype were characterized by an estimated annual income below $30,000, an estimated home value below $243,000, and no college attendance. Variables identified by CART were also among the top-ranked variables in the random forest regarding their variable. Other important variables identified by random forest include, whether the patient ever registered to vote, risk of not being motivated to manage one's own health, and business records. Conclusion: Compared to community-level data, the use of patient-specific socioeconomic data provides a more granular understanding of clinical trial disparities. The use of machine learning methods of CART and random forest offers a promising avenue for future disparities research. This novel approach will deepen our understanding of cancer disparities and improve our ability to intervene and improve care for our most vulnerable patients. Citation Format: Weichuan Dong, Jamie Shoag, Tamila Kindwall-Keller, Ali Kara, John Shanahan, Debora Bruno, Johnie Rose, Siran Koroukian, Richard Hoehn. A novel approach to understanding disparities in cancer clinical trial enrollment [abstract]. In: Proceedings of the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2023 Sep 29-Oct 2;Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2023;32(12 Suppl):Abstract nr A128.
Read full abstract