Pre-processing Feature Selection for Improved C&amp;RT Models for Oral Absorption

Danielle Newby,Taravat Ghafourian,Alex A Freitas

doi:10.1021/ci400378j

Abstract

There are currently thousands of molecular descriptors that can be calculated to represent a chemical compound. Utilizing all molecular descriptors in Quantitative Structure-Activity Relationships (QSAR) modeling can result in overfitting, decreased interpretability, and thus reduced model performance. Feature selection methods can overcome some of these problems by drastically reducing the number of molecular descriptors and selecting the molecular descriptors relevant to the property being predicted. In particular, decision trees such as C&RT, although they have an embedded feature selection algorithm, can be inadequate since further down the tree there are fewer compounds available for descriptor selection, and therefore descriptors may be selected which are not optimal. In this work we compare two broad approaches for feature selection: (1) a "two-stage" feature selection procedure, where a pre-processing feature selection method selects a subset of descriptors, and then classification and regression trees (C&RT) selects descriptors from this subset to build a decision tree; (2) a "one-stage" approach where C&RT is used as the only feature selection technique. These methods were applied in order to improve prediction accuracy of QSAR models for oral absorption. Additionally, this work utilizes misclassification costs in model building to overcome the problem of the biased oral absorption data sets with more highly absorbed than poorly absorbed compounds. In most cases the two-stage feature selection with pre-processing approach had higher model accuracy compared with the one-stage approach. Using the top 20 molecular descriptors from the random forest predictor importance method gave the most accurate C&RT classification model. The molecular descriptors selected by the five filter feature selection methods have been compared in relation to oral absorption. In conclusion, the use of filter pre-processing feature selection methods and misclassification costs produce models with better interpretability and predictability for the prediction of oral absorption.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pre-processing Feature Selection for Improved C&RT Models for Oral Absorption

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Journal: Journal of Chemical Information and Modeling	Publication Date: Oct 9, 2013
Citations: 24

Similar Papers

A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Qianwu Ni ... Lei Chen
Combinatorial Chemistry & High Throughput Screening | VOL. 20
Qianwu Ni, et. al.Qianwu Ni ... Lei Chen
23 Oct 2017
Combinatorial Chemistry & High Throughput Screening | VOL. 20

A Systematic Review of Feature Selection Techniques in Software Quality Prediction
Hadeel Alsolai ... Marc Roper
-
Hadeel Alsolai, et. al.Hadeel Alsolai ... Marc Roper
01 Nov 2019
01 Nov 2019

An experimental comparison of feature selection methods on two-class biomedical datasets
P Drotár ... Z Smékal
Computers in Biology and Medicine | VOL. 66
P Drotár, et. al.P Drotár ... Z Smékal
24 Aug 2015
Computers in Biology and Medicine | VOL. 66

Deep Neural Network Feature Selection Approaches for Data-Driven Prognostic Model of Aircraft Engines
Phattara Khumprom ... Nita Yodo
Aerospace | VOL. 7
Phattara Khumprom, et. al.Phattara Khumprom ... Nita Yodo
04 Sep 2020
Aerospace | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-processing Feature Selection for Improved C&amp;RT Models for Oral Absorption

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Pre-processing Feature Selection for Improved C&RT Models for Oral Absorption