Abstract

Abstract The US government invests substantial sums to control the HIV/AIDS epidemic. To monitor progress toward epidemic control, PEPFAR, or the President’s Emergency Plan for AIDS Relief, oversees a data reporting system that includes standard indicators, reporting formats, information systems, and data warehouses. These data, reported quarterly, inform understanding of the global epidemic, resource allocation, and identification of trouble spots. PEPFAR has developed tools to assess the quality of data reported. These tools made important contributions but are limited in the methods used to identify anomalous data points. The most advanced consider univariate probability distributions, whereas correlations between indicators suggest a multivariate approach is better suited. For temporal analysis, the same tool compares values to the averages of preceding periods, though does not consider underlying trends and seasonal factors. To that end, we apply two methods to identify anomalous data points among routinely collected facility-level HIV/AIDS data. One approach is Recommender Systems, an unsupervised machine learning method that captures relationships between users and items. We apply the approach in a novel way by predicting reported values, comparing predicted to reported values, and identifying the greatest deviations. For a temporal perspective, we apply time series models that are flexible to include trend and seasonality. Results of these methods were validated against manual review (95% agreement on non-anomalies, 56% agreement on anomalies for recommender systems; 96% agreement on non-anomalies, 91% agreement on anomalies for time series). This tool will apply greater methodological sophistication to monitoring data quality in an accelerated and standardized manner.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call