In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner

Jing Chen,Yuan Yan Tang,Bin Fang,Chang Guo

doi:10.1016/j.jmgm.2012.01.002

Abstract

With an increasing need for the rapid and effective safety assessment of compounds in industrial and civil-use products, in silico toxicity exploration techniques provide an economic way for environmental hazard assessment. The previous in silico researches have developed many quantitative structure–activity relationships models to predict toxicity mechanisms for last decade. Most of these methods benefit from data analysis and machine learning techniques, which rely heavily on the characteristics of data sets. For Tetrahymena pyriformis toxicity data sets, there is a great technical challenge—data imbalance. The skewness of data class distribution would greatly deteriorate the prediction performance on rare classes. Most of the previous researches for phenol mechanisms of toxic action prediction did not consider this practical problem. In this work, we dealt with the problem by considering the difference between the two types of misclassifications. Random Forest learner was employed in cost-sensitive learning framework to construct prediction models based on selected molecular descriptors. In computational experiments, both the global and local models obtained appreciable overall prediction accuracies. Particularly, the performance on rare classes was indeed promoted. Moreover, for practical usage of these models, the balance of the two misclassifications can be adjusted by using different cost matrices according to the application goals.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner

Abstract

Talk to us

Similar Papers

More From: Journal of Molecular Graphics and Modelling

Lead the way for us

Journal: Journal of Molecular Graphics and Modelling	Publication Date: Jan 17, 2012
Citations: 19

Similar Papers

A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture
Siddheshwar V Patil ... Dinesh B Kulkarni
-
Siddheshwar V Patil, et. al.Siddheshwar V Patil ... Dinesh B Kulkarni
01 Jan 2019
01 Jan 2019

Lessons from debiasing data for fair and accurate predictive modeling in education
Lele Sha ... Guanliang Chen
Expert Systems with Applications | VOL. 228
Lele Sha, et. al.Lele Sha ... Guanliang Chen
08 May 2023
Expert Systems with Applications | VOL. 228

Application of Machine Learning Methods in Mental Health Detection: A Systematic Review
Rohizah Abd Rahman ... Shahrul Azman Mohd Noah
IEEE Access | VOL. 8
Rohizah Abd Rahman, et. al.Rohizah Abd Rahman ... Shahrul Azman Mohd Noah
01 Jan 2020
IEEE Access | VOL. 8

Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates
Veit Schwämmle ... Ileana Rodríguez León
Journal of Proteome Research | VOL. 12
Veit Schwämmle, et. al.Veit Schwämmle ... Ileana Rodríguez León
05 Aug 2013
Journal of Proteome Research | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner

Abstract

Talk to us

Similar Papers

More From: Journal of Molecular Graphics and Modelling