Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction

Ebenezer Nanor,Wei-Ping Wu,Brighter Agyemang,Victor K Agbesi,Strato Angsoteng Bayitaa

doi:10.1145/3457682.3457689

Abstract

Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.

Full Text