Abstract BACKGROUND Epidemiologic stages of inflammatory bowel disease (IBD) have been proposed: 1. Emergence (low incidence and prevalence); 2. Acceleration in Incidence (rapidly rising incidence, low prevalence); and 3. Compounding Prevalence (stabilizing incidence, rapidly rising prevalence). To date, these stages have been theoretical without quantified definitions of incidence and prevalence. AIM To use machine learning to determine incidence and prevalence ranges corresponding to the epidemiologic stages and provide stage classifications across time for global regions. METHODS We built a supervised random forest classifier in R to determine epidemiologic stages of IBD from population-based studies (n=340), a subset derived from a systematic review on the incidence and prevalence of IBD. A labelled training data set comprising rates of incidence and prevalence of Crohn’s disease (CD) and ulcerative colitis (UC) extracted from the systematic review was used to predict classifications of stage 1, stage 2, or stage 3 for each region, stratified by decade (1960–2019). Model accuracy was measured using a blind validation data set. The validated model was then used to predict stage classifications for regions in the data set. Interquartile ranges for incidence and prevalence of CD and UC were calculated on the random forest output, and the distributions were compared using negative binomial regression. RESULTS The random forest’s classification accuracy on the blinded validation data was 93.7% (95%CI: 90.6, 96.1) indicating an appropriate model fit and performance. Significant differences between all stages for the incidence and prevalence of CD and UC (p<0.001) were found. The clear distinction across stages defines the incidence and prevalence ranges (25th–75th, per 100,000) for IBD as: CD incidence 0.0–0.3, UC incidence 0.2–0.7, CD prevalence 0.3–2.2, and UC prevalence 1.7–8.1 for stage 1; CD incidence 1.0–4.4, UC incidence 2.3–6.3, CD prevalence 9.0–33.9, and UC prevalence 22.8–73.3 for stage 2; and CD incidence 6.6–14.0, UC incidence 10.1–18.1, CD prevalence 163.2–274.7, and UC prevalence 189.1–323.2 for stage 3 (Figure 1). A decade-by-decade analysis shows global regions transitioning across the epidemiologic stages (Figure 2). By the 2010s, North America, Scandinavia, Western Europe, Australia, and New Zealand were in stage 3. Most regions in Asia and Latin America were in stage 1 in the last half of the 20th century, with many transitioning to stage 2 in the 2010s. DISCUSSION Temporal incidence and prevalence data show that regions transition across epidemiologic stages. Numerical definitions of the epidemiologic stages can be used to establish the anticipated burden growth of IBD by providing estimated rates of the number of incident and prevalent IBD cases a region can expect as it transitions between IBD epidemiologic stages in the future. Figure 1 Coalescing ranges for incidence (panel A) and prevalence (panel B) by Crohn’s disease and ulcerative colitis at epidemiologic stage 1, stage 2, and stage 3. Data were categorized by data type (incidence or prevalence), disease type (Crohn’s disease or ulcerative colitis), and epidemiologic stage, as per results from the random forest classifier. The 25th and 75th percentiles were calculated using the rates across all regions included in the analysis for all available time points for each box group. Figure 2 Global maps depicting epidemiologic stages of IBD evolution from 1960 to 2019 broken down by decade, as predicted by the random forest model. Panel A contains stage classifications from 1960 to 1969; panel B contains stage classifications from 1970 to 1979; panel C contains stage classifications from 1980 to 1989; panel D contains stage classifications from 1990 to 1999; panel E contains stage classifications from 2000 to 2009; and panel F contains stage classifications from 2010 to 2019.
Read full abstract