The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure–property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable ‘black box’ feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact.
Read full abstract