Abstract
Identification of potential drug targets is a crucial task in the drug-discovery pipeline. Successful identification of candidate drug targets in entire genomes is very useful, and computational prediction methods can speed up this process. In the current work we have developed a sequence-based prediction method for the successful identification and discrimination of human drug target proteins, from human non-drug target proteins. The training features include sequence-based features, such as amino acid composition, amino acid property group composition, and dipeptide composition for generating predictive models. The classification of human drug target proteins presents a classic example of class imbalance. We have addressed this issue by using SMOTE (Synthetic Minority Over-sampling Technique) as a preprocessing step, for balancing the training data with a ratio of 1:1 between drug targets (minority samples) and non-drug targets (majority samples). Using ensemble classification learning method-Rotation Forest and ReliefF feature-selection technique for selecting the optimal subset of salient features, the best model with selected features can achieve 87.1% sensitivity, 83.6% specificity, and 85.3% accuracy, with 0.71 Matthews correlation coefficient (mcc) on a tenfold stratified cross-validation test. The subset of identified optimal features may help in assessing the compositional patterns in human drug targets. For further validation, using a rigorous leave-one-out cross-validation test, the model achieved 88.1% sensitivity, 83.0% specificity, 85.5% accuracy, and 0.712 mcc. The proposed method was tested on a second dataset, for which the current pipeline gave promising results. We suggest that the present approach can be applied successfully as a complementary tool to existing methods for novel drug target prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.