Abstract
BackgroundComputational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other properties of proteins have not been fully investigated.MethodsUsing the DrugBank (version 3.0) database containing nearly 6,816 drug entries including 760 FDA-approved drugs and 1822 of their targets and human UniProt/Swiss-Prot databases, we defined 1578 non-redundant drug target and 17,575 non-drug target proteins. To select these non-redundant protein datasets, we built four datasets (A, B, C, and D) by considering clustering of paralogous proteins.ResultsWe first reassessed the widely used properties of drug target proteins. We confirmed and extended that drug target proteins (1) are likely to have more hydrophobic, less polar, less PEST sequences, and more signal peptide sequences higher and (2) are more involved in enzyme catalysis, oxidation and reduction in cellular respiration, and operational genes. In this study, we proposed new properties (essentiality, expression pattern, PTMs, and solvent accessibility) for effectively identifying drug target proteins. We found that (1) drug targetability and protein essentiality are decoupled, (2) druggability of proteins has high expression level and tissue specificity, and (3) functional post-translational modification residues are enriched in drug target proteins. In addition, to predict the drug targetability of proteins, we exploited two machine learning methods (Support Vector Machine and Random Forest). When we predicted drug targets by combining previously known protein properties and proposed new properties, an F-score of 0.8307 was obtained.ConclusionsWhen the newly proposed properties are integrated, the prediction performance is improved and these properties are related to drug targets. We believe that our study will provide a new aspect in inferring drug-target interactions.
Highlights
Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development
Similarities in amino acids sequences with existing drug targets and in functional roles of target proteins, including G-protein-coupled receptors (GPCRs), enzymes, and ion channels, have been main resources for inferring drug-target interactions, and many predictions have been performed within each functional category [2]
We investigated whether hDP+ include signal peptide sequences, which play an important role in the pharmacokinetics [29]
Summary
Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. More resources, including side effects of drugs, drug-drug interactions, and protein-protein interactions, have been incorporated for predicting new drug targets [3, 4] Such prediction efforts will be advanced if more properties of drug targets can be revealed. When Hopkins and Groom [5] identified 399 non-redundant molecular targets, targets were contained in only 130 protein families, half of which fall into just six gene families, including GPCRs and serine/threonine and tyrosine protein kinases. At that time, they predicted that the numbers of druggable genomes and drug targets would be approximately 3,000 and around 600-1500, respectively. This database contains drug-target interactions with gene annotations from Swiss-Prot [10]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have