Schistosomiasis is a neglected tropical disease affecting over 150 million people. Hotspots of Schistosoma transmission-communities where infection prevalence does not decline adequately with mass drug administration-present a key challenge in eliminating schistosomiasis. Current approaches to identify hotspots require evaluation 2-5 y after a baseline survey and subsequent mass drug administration. Here, we develop statistical models to predict hotspots at baseline prior to treatment comparing three common hotspot definitions, using epidemiologic, survey-based, and remote sensing data. In a reanalysis of randomized trials in 589 communities in five endemic countries, a regression model predicts whether Schistosoma mansoni infection prevalence will exceed the WHO threshold of 10% in year 5 ("prevalence hotspot") with 86% sensitivity, 74% specificity, and 93% negative predictive value (NPV; assuming 30% hotspot prevalence), and a regression model for Schistosoma haematobium achieves 90% sensitivity, 90% specificity, and 96% NPV. A random forest model predicts whether S. mansoni moderate and heavy infection prevalence will exceed a public health goal of 1% in year 5 ("intensity hotspot") with 92% sensitivity, 79% specificity, and 96% NPV, and a boosted trees model for S. haematobium achieves 77% sensitivity, 95% specificity, and 91% NPV. Baseline prevalence is a top predictor in all models. Prediction is less accurate in countries not represented in training data and for a third hotspot definition based on relative prevalence reduction over time ("persistent hotspot"). These models may be a tool to prioritize high-risk communities for more frequent surveillance or intervention against schistosomiasis, but prediction of hotspots remains a challenge.
Read full abstract