Activated sludge (AS) microbial communities are influenced by various environmental variables. However, a comprehensive analysis of how these variables jointly and nonlinearly shape the AS microbial community remains challenging. In this study, we employed advanced machine learning techniques to elucidate the collective effects of environmental variables on the structure and function of AS microbial communities. Applying Dirichlet multinomial mixtures analysis to 311 global AS samples, we identified four distinct microbial community types (AS-types), each characterized by unique microbial compositions and metabolic profiles. We used 14 classical linear and nonlinear machine learning methods to select a baseline model. The extremely randomized trees demonstrated optimal performance in learning the relationship between environmental factors and AS types (with an accuracy of 71.43%). Feature selection identified critical environmental factors and their importance rankings, including latitude (Lat), longitude (Long), precipitation during sampling (Precip), solids retention time (SRT), effluent total nitrogen (Effluent TN), average temperature during sampling month (Avg Temp), mixed liquor temperature (Mixed Temp), influent biochemical oxygen demand (Influent BOD), and annual precipitation (Annual Precip). Significantly, Lat, Long, Precip, Avg Temp, and Annual Precip, influenced metabolic variations among AS types. These findings emphasize the pivotal role of environmental variables in shaping microbial community structures and enhancing metabolic pathways within activated sludge. Our study encourages the application of machine learning techniques to design artificial activated sludge microbial communities for specific environmental purposes.
Read full abstract