Developing a generalized model for a robust prediction of nanotoxicity is critical for designing safe nanoparticles. However, complex toxicity mechanisms of nanoparticles in biological environments, such as biomolecular corona formation, prevent a reliable nanotoxicity prediction. This is exacerbated by the potential evaluation bias caused by internal validation, which is not fully appreciated. Herein, we propose an evidence-based prediction method for distinguishing between cytotoxic and noncytotoxic nanoparticles at a given condition by uniting literature data mining and machine learning. We illustrate the proposed method for amorphous silica nanoparticles (SiO2-NPs). SiO2-NPs are currently considered a safety concern; however, they are still widely produced and used in various consumer products. We generated the most diverse attributes of SiO2-NP cellular toxicity to date, using >100 publications, and built predictive models, with algorithms ranging from linear to nonlinear (deep neural network, kernel, and tree-based) classifiers. These models were validated using internal (4124-sample) and external (905-sample) data sets. The resultant categorical boosting (CatBoost) model outperformed other algorithms. We then identified 13 key attributes, including concentration, serum, cell, size, time, surface, and assay type, which can explain SiO2-NP toxicity, using the Shapley Additive exPlanation values in the CatBoost model. The serum attribute underscores the importance of nanoparticle-corona complexes for nanotoxicity prediction. We further show that internal validation does not guarantee generalizability. In general, safe SiO2-NPs can be obtained by modifying their surfaces and using low concentrations. Our work provides a strategy for predicting and explaining the toxicity of any type of engineered nanoparticles in real-world practice.
Read full abstract