Abstract

The recent advances in Machine Learning tools and algorithms have influenced fields including drug discovery. Nowadays, research conducted via trial- and-error experiments have been replaced by computational approaches. This growth prompted an undeniable development in synthesizing chemical data to support chemoinformatics research. One of the widely used tools to model chemoinformatics problems is Quantitative Structure-Activity Relationships (QSAR). Previous QSAR models were dealing with small datasets and limited number of features. Current QSAR datasets suffer from the problem of high dimensionality, where the number of features exceeds the number of records. Over the years, the curse of high dimensionality posed a major shortcoming in QSAR classification models. Linear Principle Component Analysis is a popular feature extraction method used to reduce the high dimensioanlity of QSAR datasets. However, QSAR datasets are highly complex and require deep understanding of features representation. Autoencoder is a type of neural networks that is not fully explored in QSAR modeling for dimensionality reduction purposes. In this research, we investigate the impact of autoencoder on a high dimensional QSAR dataset. The autoencoder performance is compared with PCA on the over all accuracy measure. Our preliminary analysis demonstrated that the proposed technique outperforms PCA.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.