Abstract
In toxicity evaluation based on the nuclear receptor signalling pathway, in silico prediction tools are used for the detection of the early stages of long-term toxicities, the prioritization of newly synthesized chemicals and the acquisition of the selectivity and sensitivity. Computational prediction model is one of the promising tools for the toxicity screening of the chemical-protein interaction as deep learning has been improved the prediction accuracies. However, the challenge is that data-imbalanced conditions, where the volume of toxic chemical compound dataset is much smaller than the nontoxic dataset, result in low prediction accuracy of the toxic dataset providing valid information to toxicity hazard. In this paper, we have examined the effect of data imbalance in the toxicity assessment data of AR (LBD), ER (LBD), AhR, and PPAR as nuclear receptors, and identified the severe imbalance between the prediction of the toxic and nontoxic datasets. As the acquisition of the balanced selectivity and sensitivity is required for the assessment of toxicity hazards, data resampling methods have been investigated in order to improve the bias problem in binary classification for toxicity hazard profiling of nuclear receptor. The experimental results achieved a sensitivity of 0.714 and a specificity of 0.787, with an overall accuracy of 0.829 and a ROC-AUC of 0.822 by the simple resampling methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.