Abstract

Nowadays it is important to develop effective computational methods for accurately identifying and predicting biological activity in the virtual screening of bioassay data so as to speed up the process of drug development. Among these methods, multi-criteria optimization classifier (MCOC) is a classifier which can find a trade-off between the overlapping degree of different classes and the total distance from input points to the decision hyperplane. The former should be minimized while the latter should be maximized. Then a decision function is derived from training data and this function is subsequently used to predict the class label of an unseen instance. However, due to outliers, anomalies, highly imbalanced classes, high dimension, nonlinear separability and other uncertainties in data, MCOC and other methods often give the poor predictive performance. In this paper, we introduce a new fuzzy contribution to each input point based on class median, by defining the new row and column kernel functions the linear combination of different feature kernels to replace the single kernel function in the kernel-induced feature space and penalty factors to imbalanced classes, thus a novel multi-kernel multi-criteria optimization classifier with fuzzification and penalty factors (MK–MCOC–FP) is proposed and the effects of the aforementioned problems are significantly reduced. The experimental results of predicting active compounds in the virtual screening and comparison with linear and quadratic MCOCs, support vector machines (SVM), fuzzy SVM and neural network, the conclusions show that MK–MCOC–FP evidently increased the ability of resisting noise interference, the predictive accuracy of highly class-imbalanced bioassay data, the separation of active compounds and inactive compounds, the interpretability of importance or contributions of different features to classification, the efficiency of classification with feature selection or dimensionality reduction for high-dimensional data, and the generalization of predicting the biological activity of new compounds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call