Abstract

AbstractAromatic and heteroaromatic amines are widely used in industrial chemicals and can be found in cooked foods and in tobacco smoke. In this study, Quantitative Structure–Activity Relationships (QSARs) are developed that correlate the observed carcinogenic activities of 80 aromatic and heteroaromatic amines. Principal Component Regression and stepwise linear regression techniques have been applied to construct the QSAR models. The performance of these two models is slightly superior compared to the previous reported based on the same dataset by multiple linear regression techniques. To improve the performance, Support Vector Regression (SVR) has been used to construct the QSARs and Genetic Algorithm (GA) has been used to select the most informational descriptors. Additionally, by introducing the concept of the weighting technique into the model, a new SVR, optimized sample‐weighted SVR is proposed. The optimal weighted coefficient is 0.2. The results suggest that approaches using GA selecting descriptors and weighting the descriptors can effectively improve the performance of the SVR models. The optimal Root Mean Square Error in Prediction is 0.799, which is relative smaller than other models. Jackknife‐testing procedure has been used to validate the models. The results indicate that the selected descriptors by GA and weighting technique are important and necessary to improve the performance of QSAR models by SVR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.