The aggregation of Cadmium (Cd) in rice grains is a significant threat to human healthy. The complexity of the soil-rice system, with its numerous influencing parameters, highlights the need to identify the crucial factors responsible for Cd aggregation. This study uses machine learning (ML) modeling to predict Cd aggregation in rice grains and identify the influencing factors. Data from 474 data points from 77 published works were analyzed, and eight ML models were established using different algorithms. The input variables were total soil Cd concentration (TS Cd) and extractable Cd concentration (Ex-Cd), while rice Cd concentration (Cdrice) was the output variable. Among the models, the Extremely Randomized Trees (ERT) model performed the best (TS Cd: R2 = 0.825; Ex-Cd: R2 = 0.792), followed by Random Forest (TS Cd: R2 = 0.721; Ex-Cd: R2 = 0.719). The ERT feature importance ranking analysis revealed that the essential factors responsible for Cd aggregation are cation exchange capacity (CEC), TS Cd, Water Management Model (WMM), and pH for total soil Cd as input variables. For extractable Cd as an input variable, the vital factors are CEC, Ex-Cd, pH, and WMM. The study highlights the importance of the Water Management Model and its impact on Cd concentration in rice grains, which has been overlooked in previous research.Please check and confirm that the authors and their respective affiliations have been correctly identified and amend if necessary.The authors and their respective affiliations are correct.Author details: Kindly check and confirm whether the corresponding author is correctly identified.It is correct.
Read full abstract