Legionella pneumophila (L. pneumophila) is a pathogenic bacterium primarily known for causing Legionnaires’ Disease which is known for high mortality rates, particularly in the elderly. With caseloads continuing to increase, further research is needed to improve our understanding of optimized sampling schema and safe limits of L. pneumophila, in part to target improved treatment options and realistic population-level risk modeling. Particularly in healthcare and other high-risk locations these become crucial and time sensitive needs. Therefore, we conceptualized this research as a means of incorporating easily measured physiochemical water quality parameters and generalization of the unique ecology of building water systems to build a computational model that can allow for more rapid and accurate decision making. This research uses the specific machine learning (ML) method called statistical learning theory to incorporate concentration of host cells, such as native amoeba, and physiochemical water quality parameters to estimate the probability of observing ranges of Legionella gene copy concentrations. Using data from previously published research on Legionella prevalence in a large building, our ML method trains the model on the relative impacts of physiochemical parameters on likely amoeba host cell occurrences. The model is expanded to estimate host cell concentrations using correlations and regressions operated through LASSO algorithms. After categorization variables from these results are then used to inform a logistic regression to provide an estimate of the probability of Legionella gene copy concentration ranges. In summary, conventional results generated by logistic regression and multiple linear regression quantified the associations among ecological conditions in the water and ability to predict a likely range of Legionella concentration in a management focused way. Further, two ML methods, PCA and LASSO, demonstrated feasibility in accurate real-time monitoring of Legionella through physiochemical indicators as evidenced with good accuracy of predictions based for validation results. Furthermore results demonstrate the vital need to account for the impact of water quality on building on host cells, and via their quantified water microbial ecology, not just Legionella concentrations.
Read full abstract