Abstract Background: Cervical cancer (CC) – one of the most preventable malignancies, is best known for its impact to women in developing countries. The mortality-to-incidence (MIR) ratio worldwide is approximately 0.53, versus 0.29 in the US. However, hidden in this figure are women from places such as the predominantly African American community of West Garfield Park in Chicago which has a CC MIR of 0.63, comparable to a developing country. We need simple, versatile tools to identify women at increased risk for late-stage at CC diagnosis. Purpose: We test the feasibility of using machine learning to examine individual and census tract level predictors of a late-stage CC diagnosis in order to prioritize education, screening, and vaccination. Methods: For this analysis, we used a dataset with 164 CC cases diagnosed at the University of Illinois Cancer Center (UICC) between 2001 and 2018, and 46 (individual and neighborhood-level) attributes. We used the recursive partitioning approach with inverse probability weighting to generate a decision tree with a set of logical if-then conditions for predicting late stage at CC diagnosis. Results: The age at CC diagnosis for women in this dataset ranged from 22 to 80, with a mean of 47 years of age. Roughly half of the patients were African American (54%), ever smokers (47%), ever screened for CC (52% compliant, 15% delayed screeners), and had a history of an abnormal pap result (57%). Overall, 15% of the women were diagnosed at later stages (3, 4 or 5). The estimated accuracy of the fitted model was 91%. Based on the decision tree, the highest (71%) predicted probability of a late-stage CC diagnosis was estimated for the subgroup of women who resided in census tracts where: less than 12% of residents had long work commutes (in excess of 60 minutes); >and 30% of residents spent over 50% of their household income on rent; and >12% of households were female-headed with children. Conversely, the lowest predicted probability (6%) of a late-stage CC diagnosis was estimated for women from census tracts where 12% or more of residents reported long work commutes, and whose individual BMI was <19, and who lived within 10 miles of UICC. These results align with hot spot analysis results for these data showing two main clusters in neighborhoods on the south and west sides of Chicago with similar characteristics to those derived from the decision tree. Conclusions: The decision tree approach of machine learning generated a simple algorithm that identified subgroups at high risk for late stage at CC diagnosis. Though results lack generalizability and suffer from inherent overfitting, they have logical validity. Properly trained and tested in a validation set, this approach may be particularly useful to population-level cancer disparity researchers to identify most relevant individual and neighborhood-level characteristics and appropriate cut points, uncover hidden vulnerable subpopulations and partner with communities to inform strategies for interventions. Citation Format: Katherine Y Tossas, Jenna Khan, Robert A Winn. Hidden figures – an example of using machine learning to prioritize cervical cancer screening outreach [abstract]. In: Proceedings of the Twelfth AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2019 Sep 20-23; San Francisco, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(6 Suppl_2):Abstract nr A010.
Read full abstract