Probabilistic classifier: generated using randomised sub-sampling of the feature space

Jonathan D Tyzack,Robert C Glen,Hamse Y Mussa

doi:10.1186/1758-2946-4-s1-p40

Jonathan D Tyzack, Robert C Glen + Show 1 more

Open Access

https://doi.org/10.1186/1758-2946-4-s1-p40

Copy DOI

Abstract

Nowadays supervised classification, based on the concept of pattern recognition, is an integral part of virtual screening. The central idea of supervised classification in chemoinformatics is to design a classifying algorithm that accurately assigns a new molecule to one of a set of predefined classes. Naturally, probabilistic classifiers can be far more useful than hard point classifiers in making a decision on problems [1], such as virtual screening, where there is an associated risk in classifying an instance to one class or the other. For their conceptual simplicity and computational efficiency probabilistic classification methods based on the Naive Bayes concept are widely employed in chemoinformatics. The simplicity of the Naive Bayes is due to the assumption that the descriptors representing the molecule one desires to classify are statistically independent. Unfortunately it is well documented that when the molecular descriptors are binary-valued - which is often the case in chemoinformatics - and thus take values of 0 or 1 the Naive Bayesian classifier can only act as a linear classifier in the descriptor space. Techniques such as the Parzen-Window approach can address the above shortcomings but suffer from being computationally expensive as they require one to retain all the training dataset in core memory [2,3]. In an attempt to address the above mentioned drawbacks, a new probabilistic classifier is proposed which uses randomized sub-sampling of the descriptor space. The proposed algorithm generates better class membership predictions than its Naive Bayesian counterpart on classifying molecules that are non-linearly separable in descriptor space. We present a realistic test of the new method by classifying large chemical datasets generated from the ChEMBL database [4].

Highlights

Nowadays supervised classification, based on the concept of pattern recognition, is an integral part of virtual screening
Probabilistic classifiers can be far more useful than hard point classifiers in making a decision on problems [1], such as virtual screening, where there is an associated risk in classifying an instance to one class or the other
For their conceptual simplicity and computational efficiency probabilistic classification methods based on the Naive Bayes concept are widely employed in chemoinformatics

Summary

Introduction

Nowadays supervised classification, based on the concept of pattern recognition, is an integral part of virtual screening. Probabilistic classifiers can be far more useful than hard point classifiers in making a decision on problems [1], such as virtual screening, where there is an associated risk in classifying an instance to one class or the other.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: May 1, 2012
Citations: 1	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Probabilistic classifier: generated using randomised sub-sampling of the feature space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

R-NN Curves: An Intuitive Approach to Outlier Detection Using a Distance Based Method
Rajarshi Guha ... Ting Chen
Journal of Chemical Information and Modeling | VOL. 46
Rajarshi Guha, et. al.Rajarshi Guha ... Ting Chen
01 Jun 2006
Journal of Chemical Information and Modeling | VOL. 46

Integration of Multicomplex‐Based Pharmacophore Modeling and Molecular Docking in Machine Learning‐Based Virtual Screening: Toward the Discovery of Novel PI3K Inhibitors
Shuo Qiu ... Jian Jin
Advanced Theory and Simulations | VOL. 7
Shuo Qiu, et. al.Shuo Qiu ... Jian Jin
21 May 2024
Advanced Theory and Simulations | VOL. 7

Machine Learning‐Enabled Virtual Screening with Multiple Protein Structures toward the Discovery of Novel JAK3 Inhibitors: Integration of Molecular Docking, Pharmacophore, and Naïve Bayesian Classification
Jingyu Zhu ... Yanfei Cai
Advanced Theory and Simulations | VOL. 6
Jingyu Zhu, et. al.Jingyu Zhu ... Yanfei Cai
07 May 2023
Advanced Theory and Simulations | VOL. 6

Molecular similarity analysis and virtual screening by mapping of consensus positions in binary-transformed chemical descriptor spaces with variable dimensionality.
Jeffrey W Godden ... Jürgen Bajorath
Journal of chemical information and computer sciences | VOL. 44
Jeffrey W Godden, et. al.Jeffrey W Godden ... Jürgen Bajorath
08 Nov 2003
Journal of chemical information and computer sciences | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probabilistic classifier: generated using randomised sub-sampling of the feature space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics