Abstract
The relationship between cadmium (Cd) concentration in rice grains and the soil that they are cultivated in is highly uncertain due to the influence of soil properties, rice varieties, and other undetermined factors. In this study, we introduce the probability of exceeding the threshold to characterize this uncertainty and then, build a probabilistic forewarning model. Additionally, a number of associated factors have been used as parameters to improve model performance. Considering that the physicochemical properties and Cd concentration in the soil (Cdsoil) do not follow a normal distribution, and are not independent of each other, a discriminative algorithm, represented by a logistic regression (LR), performed better than generative algorithms, such as the naive Bayes and quadratic discriminant analysis models. The performance of the LR based model was found to be 0.5% better in the case of the univariate model (Cdsoil) and 4.1% better with a multivariate model (soil properties used as additional factors) (p < 0.01). The output of the LR based model predicted probabilities that were positively correlated to the true exceedance rate (R2 = 0.949,p < 0.01), within an exceedance threshold range of 0.1–0.4 mg kg−1 and a mean deviation of 5.75%. A sensitivity analysis showed that the effect of soil properties on the exceedance probability weakens with an increase in Cd concentration in rice grains. When the threshold is below 0.15 mg kg−1, soil pH strongly influences the exceedance probability. As the threshold increases, the influence of pH on the exceedance probability is gradually superseded. By quantifying the uncertainty regarding the relationship between Cd concentration in rice grains and soil, the discriminative algorithm-based probabilistic forecasting model offers a new way to assess Cd pollution in rice grown in contaminated paddy fields.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have