Assessing chemical toxicity in materials like plastic packaging is critical to safeguarding public health. This study presents the development of classification-based machine learning models to predict the toxicity of chemicals associated with plastic packaging. Using an extensive dataset of chemical structures, we trained multiple machine learning models—Random Forest, Support Vector Machine, Linear Discriminant Analysis, and Logistic Regression—targeting endpoints such as Neurotoxicity, Hepatotoxicity, Dermatotoxicity, Carcinogenicity, Reproductive Toxicity, Skin Sensitization, and Toxic Pneumonitis. The dataset was pre-processed by selecting 2D molecular descriptors as feature inputs, with resampling methods (ADASYN, Borderline SMOTE, Random Over-sampler, SVMSMOTE Cluster Centroid, Near Miss, Random Under Sampler) applied to balance classes for accurate classification. A five-fold cross-validation technique was used to optimize model performance, with model parameters fine-tuned using grid search. The model performance was evaluated using accuracy (Acc), sensitivity (Se), specificity (Sp), and area under the receiver operating characteristic curve (AUC-ROC) metrics. In most of the cases, the model accuracy was 0.8 or above for both training and test sets. Additionally, SHAP (SHapley Additive exPlanations) values were utilized for feature importance analysis, highlighting significant descriptors contributing to toxicity predictions. The models were ranked using the Sum of Ranking Differences (SRD) method to systematically select the most effective model. The optimal models demonstrated high predictive accuracy and interpretability, providing a scalable and efficient solution for toxicity assessment compared to traditional methods. This approach offers a valuable tool for rapidly screening potentially hazardous chemicals in plastic packaging.
Read full abstract