Abstract

Introduction: Current guidelines for detecting celiac disease (CD) recommend case-finding, which is targeted testing of individuals considered to be at risk for CD due to associated conditions, signs, and symptoms. Increasing evidence suggests that case-finding is not effective. CD remains largely undiagnosed, and while the benefit of identifying asymptomatic cases is uncertain, detection of symptomatic cases improves morbidity and mortality. We aimed to use machine learning, a form of artificial intelligence where programs learn through exposure to new data, to build a model to identify undiagnosed CD using currently accepted indications to test. Methods: Blood samples collected from 47,557 individuals (age 18-87.7 years) with no prior diagnosis of CD were tested for tissue transglutaminase antibodies, and if positive or equivocal (defined as ≥3.0 U/ML), the samples were subsequently tested for endomysial antibodies. If these were positive, the individual was identified as having undiagnosed CD. 408 cases and 408 age- and gendermatched controls were selected, and 8 matched pairs were removed due to insufficient data, resulting in 400 cases and 400 controls. Medical records were systematically reviewed by a blinded physician for indications to clinically test (Table 1). A variety of indications to test were used to predict the presence of undiagnosed CD. Nine different classifiers including linear, non-linear, tree-based, and ensemble models were trained over a large parameter space. Performance was assessed via ten-fold cross-validation.Table: Table. Indications for Clinical TestingResults: Only two of the models, random forest and bagged classification trees, outperformed random chance at the 5% level. Both had areas under the receiver-operator curve of 0.55, indicating poor discriminatory performance (Figure 1).Figure: Area under the receiver-operator curve (AUC) from out-of-sample crossvalidation. Only the bagged tree (treebag) and random forest (rf) models outperform random chance.Conclusion: This study used machine learning techniques to attempt to develop a model that would identify undiagnosed cases of CD using currently accepted indications to test. Only two models with very low predictive power outperformed random chance in detecting undiagnosed CD. The failure to develop an effective model may be due to the fact that our currently used indications to test do not reflect the symptoms of undiagnosed cases of CD, or that these symptoms may not be severe enough to come to the attention of care providers. This suggests that case-finding based on symptoms may not be effective and that either new indications to test or alternatives to case-finding should be explored.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call