Abstract
BackgroundNew dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. This study aims to build quantitative structure-activity relationship (QSAR) models using the artificial intelligence paradigm. Rotation Forest and Deep Neural Network (DNN) are used to predict QSAR models. We compared principal component analysis (PCA) with sparse PCA (SPCA) as methods for transforming Rotation Forest. K-modes clustering with Levenshtein distance was used for the selection method of molecules, and CatBoost was used for the feature selection method.ResultsThe amount of the DPP-4 inhibitor molecules resulting from the selection process of molecules using K-Modes clustering algorithm is 1020 with logP range value of -1.6693 to 4.99044. Several fingerprint methods such as extended connectivity fingerprint and functional class fingerprint with diameters of 4 and 6 were used to construct four fingerprint datasets, ECFP_4, ECFP_6, FCFP_4, and FCFP_6. There are 1024 features from the four fingerprint datasets that are then selected using the CatBoost method. CatBoost can represent QSAR models with good performance for machine learning and deep learning methods respectively with evaluation metrics, such as Sensitivity, Specificity, Accuracy, and Matthew’s correlation coefficient, all valued above 70% with a feature importance level of 60%, 70%, 80%, and 90%.ConclusionThe K-modes clustering algorithm can produce a representative subset of DPP-4 inhibitor molecules. Feature selection in the fingerprint dataset using CatBoost is best used before making QSAR Classification and QSAR Regression models. QSAR Classification using Machine Learning and QSAR Classification using Deep Learning, each of which has an accuracy of above 70%. The QSAR RFC-PCA and QSAR RFR-PCA models performed better than QSAR RFC-SPCA and QSAR RFR-SPCA models because QSAR RFC-PCA and QSAR RFR-PCA models have more effective time than the QSAR RFC-SPCA and QSAR RFR-SPCA models.
Highlights
The K-modes clustering algorithm is used to classify dipeptidyl peptidase-4 (DPP-4) inhibitor compounds based on molecular fingerprints that can help the process of selecting molecules rationally
Levenshtein distance, which is used as a measure of dissimilarity in the K-modes clustering algorithm, is used to measure the closeness or similarity of the DPP-4 inhibitor compound’s molecules through a string comparison of molecular bit fingerprint vectors
New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus
Summary
New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. In-silico methods apply the use of computers as a tool in drug discovery that can perform cost-efficiently compared to conventional methods, which are known to be timeconsuming and high cost [3]. They offer simulations and calculations that can rationally reduce the number of proposed compounds and assist in studying drug interactions with targets to the toxic properties of compounds and their metabolites [4]. QSAR is the ligand-based virtual screening method that studies the relationship between the chemical structures and biological activities of the molecules that can be calculated to derive a model or equation that can be used to predict the activity of a compound [4,5,6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.