Soil health encompasses a range of biological, chemical, and physical soil properties that sustain the commercial and ecological value of agroecosystems. Monitoring soil health requires a comprehensive set of diagnostics that can be cost-prohibitive for routine analyses. The soil microbiome provides a rich source of information about soil properties, which can be assayed in a high-throughput, cost-effective way. We evaluated the accuracy of random forest (RF) and support vector machine (SVM) regression and classification models in predicting 12 measures of soil health, tillage status, and soil texture from 16S rRNA gene amplicon data with an operationally relevant sample set. We validated the efficacy of the best performing models against independent datasets and also tested best practices for processing microbiome data for use in machine learning. Soil health metrics could be predicted from microbiome data with the best models achieving a Kappa value of ∼0.65, for categorical assessments, and a R2 value of ∼0.8, for numerical scores. Biological health ratings were better predicted than chemical or physical ratings. Validation with independent datasets revealed that models had general predictive value for soil properties, including yield. The ecological profiles of several taxa important for model accuracy matched the observed relationships with soil health, including Pyrinomonadaceae, Nitrososphaeraceae, and Candidatus Udeaobacter. Models trained at the highest taxonomic resolution proved most accurate, with losses in accuracy resulting from rarefying, sparsity filtering, and aggregating at higher taxonomic ranks. Our study provides the groundwork for developing scalable technology to use microbiome-based diagnostics for the assessment of soil health.
Read full abstract