Outdoor ambient sound levels can be predicted from machine learning-based models derived from geospatial and acoustic training data. To improve modeling robustness, median predicted sound levels have been calculated using tuned models from different supervised machine learning modeling classes. The ensemble-based model reduces errors at training sites for both overall levels and spectra, and produces more physically reasonable predictions elsewhere. Furthermore, the spread in the ensemble provides an estimate of the modeling accuracy. An initial analysis of feature importance metrics suggests that the number of geospatial inputs can be reduced from 120 to 15 without significant degradation of the model's predictive error, as measured by leave-one-out cross validation. However, the predictions from the reduced-feature modeling may be less physical in certain regions when all differentiating geospatial features are removed. These results suggest the need for more sophisticated data collection and validation methods.
Read full abstract