Celiac Disease (CeD) is an autoimmune disorder triggered by gluten consumption and involves the immune system and HLA in the intestine. The global incidence ranges from 0.5%-1%, with only 30% correctly diagnosed. Diagnosis remains challenging, requiring complex tests like blood tests, small bowel biopsy, and elimination of gluten from the diet. Therefore, a faster and more efficient alternative is needed. Extreme Gradient Boosting (XGBoost), an ensemble machine learning technique that utilizes decision trees to aid in the classification of Celiac disease, was used. The aim of this study was to classify patients into six classes, namely potential, atypical, silent, typical, latent and none disease, based on attributes such as blood test results, clinical symptoms and medical history. This research method employs 5-fold cross-validation to optimize parameters that are max depth, n estimator, gamma, and learning rate. Experiments were conducted 96 times to get the best combination of parameters. The results of this research are highlighted by an improvement of 0.45% above the accuracy value with the default XGBoost parameter of 98.19%. The best model was obtained in the trial with parameters max depth of 3, n estimator of 100, gamma of 0, and learning rate of 0.3 and 0.5 after modifying the parameters, yielding an accuracy rate of 98.64%, a sensitivity rate of 98.43%, and a specificity rate of 99.72%. This research shows that tuning the XGBoost parameters for Celiac
Read full abstract