Background and aimsGastric cancer (GC) is a leading cause of cancer incidence and mortality globally. Population screening is limited by the low incidence and prevalence of GC in the United States. A risk prediction algorithm to identify high-risk patients allows for targeted GC screening. We aimed to determine the feasibility and performance of a logistic regression model based on electronic health records (EHR) to identify individuals at high risk for non-cardia gastric cancer (NCGC). MethodsWe included 614 patients who had a diagnosis of NCGC between ages 40-80 years and who were seen at our large tertiary medical center in multiple states between 2010 and 2021. Controls without a diagnosis of NCGC were randomly selected in a 1:10 ratio of cases to controls. Multiple imputation by chained equation for missing data followed by logistic regression on imputed datasets was used to estimate the probability of NCGC. Area under the curve (AUC) and the 0.632 estimator was used as the estimate for discrimination. ResultsThe 0.632 estimator value was 0.731, indicating robust model performance. Probability of NCGC was higher with increasing age (odds ratio [OR]=1.16, 95% confidence interval [CI]: 1.04 – 1.3), male sex (OR = 1.97; 95% CI: 1.64-2.36), Black (OR= 3.07; 95% CI: 2.46-3.83) or Asian race (OR=4.39; 95% CI: 2.60-7.42), tobacco use (OR=1.61; 95% CI: 1.34-1.94), anemia (OR=1.35; 95% CI: 1.09-1.68) and pernicious anemia (OR=6.12, 95% CI: 3.42-10.95). ConclusionWe demonstrate the feasibility and good performance of an EHR-based logistic regression model for estimating the probability of NCGC. Future studies will refine and validate this model, ultimately identifying a high-risk cohort who could be eligible for NCGC screening.
Read full abstract