Arsonic acids (RAsO(OH)2), prevalent in contaminated food, water, air, and soil, pose significant environmental and health risks due to their variable ionization states, which influence key properties such as lipophilicity, solubility, and membrane permeability. Accurate pK a prediction for these compounds is critical yet challenging, as existing models often exhibit limitations across diverse chemical spaces. This study presents a comparative analysis of pK a predictions for arsonic acids using a support vector machine-based machine learning (ML) approach and three density functional theory (DFT)-based models. The DFT models evaluated include correlations to the maximum surface electrostatic potential (V S,max), atomic charges derived from a solvation model (solvation model based on density), and a scaled solvent-accessible surface method. Results indicate that the scaled solvent-accessible surface approach yielded high mean unsigned errors, rendering it less effective. In contrast, the atomic charge-based method on the conjugated arsonate base provided the most accurate predictions. The ML-based approach demonstrated strong predictive performance, suggesting its potential utility in broader chemical spaces. The obtained values for pK a from V S,max show a weak prediction level, because the way of predicting pK a is related only to the electrostatic character of the molecule. However, pK a is influenced by many factors, including the molecular structure, solvation, resonance, inductive effects, and local atomic environments. V S,max cannot fully capture these different interactions, as it gives a simplistic view of the overall molecular potential field.
Read full abstract