Chemogenomics Data Research Articles

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as 'dark chemical space' or 'dark chemistry.' Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: 'genomics', proteomics and 'in-silico simulation' will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.

Read full abstract

This paper presents a new Quantitative Structure-Activity Relationship (QSAR) model based on Extreme Learning Machine (ELM) to predict the biological activity of the benchmark Escape-Data sets compounds in order to provide an effective learning solution for regression analysis. The pre-processing phase of this model has been performed for the chemo-genomics datasets using the k-Nearest Neighbours (k-NN) algorithm to predict missing values of the dataset. In the second phase, the Genetic algorithm hybrid with Binary Whale Optimization algorithm (GBWOA) is adapted to determine the significance and the optimized features in feature selection phase. The min–max method is used in the third phase to transform all features to binary form in order to increases the efficiency of the proposed model by smoothing the data points and reducing fluctuation among features. ELM is used in the final phase as regression algorithm to predict chemo-genomics chemical compound. Different experiments have been performed in this paper on datasetwhich has been collected from ExCAPE chemo-genomics database project composed of 43509 compounds, 1134 targets besides biological activity and 40 chemical descriptors. The experimental results show that the proposed model is efficient in improving the level of prediction based on some statistical measurements. Also, ELM produced satisfactory results when the number of hidden nodes is greater than or equal to 1000 L. Moreover, the proposed model achieved high accuracy using R2 measure (≈0.971) which outperforms the other algorithms in literature such as (WOA, ALO, BAT and CSA) with accuracies (≈0.673, ≈0.753, ≈0.680, and ≈0.897) respectively. In addition, the docking results succeeded in validating the current QSAR model. In the current research, 41686 (95.81%) compounds are lead compound and 36965 (84.95%) compounds are a candidate for multi-target genes.

Read full abstract

Chemogenomics Data Research Articles

Articles published on Chemogenomics Data

CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis

Heli-SMACC: Helicase-targeting SMAll Molecule Compound Collection.

Epigenetic target identification strategy based on multi-feature learning

Small molecule antiviral compound collection (SMACC): A comprehensive, highly curated database to support the discovery of broad-spectrum antiviral drug molecules

Epigenetic Target Fishing with Accurate Machine Learning Models.

Epigenetic Target Profiler: A Web Server to Predict Epigenetic Targets of Small Molecules.

Automated Framework for Developing Predictive Machine Learning Models for Data-Driven Drug Discovery

CDKN2A-Inactivated Pancreatic Ductal Adenocarcinoma Exhibits Therapeutic Sensitivity to Paclitaxel: A Bioinformatics Study.

Chemogenomic profiling of breast cancer patient-derived xenografts reveals targetable vulnerabilities for difficult-to-treat tumors

Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review.

TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration.

Multi-target QSAR modelling of chemo-genomic data analysis based on Extreme Learning Machine

Predicting kinase inhibitors using bioactivity matrix derived informer sets.

Predicting Drug Interactions From Chemogenomics Using INDIGO.

QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery.

GCDB: a glaucomatous chemogenomics database for in silico drug discovery.

Core Statistical Methods for Chemogenomic Data.

A Survey of Web-Based Chemogenomic Data Resources.

Integrative cancer pharmacogenomics to establish drug mechanism of action: drug repurposing.

ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Chemogenomics Data Research Articles

Articles published on Chemogenomics Data

CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis

Heli-SMACC: Helicase-targeting SMAll Molecule Compound Collection.

Epigenetic target identification strategy based on multi-feature learning

Small molecule antiviral compound collection (SMACC): A comprehensive, highly curated database to support the discovery of broad-spectrum antiviral drug molecules

Epigenetic Target Fishing with Accurate Machine Learning Models.

Epigenetic Target Profiler: A Web Server to Predict Epigenetic Targets of Small Molecules.

Automated Framework for Developing Predictive Machine Learning Models for Data-Driven Drug Discovery

CDKN2A-Inactivated Pancreatic Ductal Adenocarcinoma Exhibits Therapeutic Sensitivity to Paclitaxel: A Bioinformatics Study.

Chemogenomic profiling of breast cancer patient-derived xenografts reveals targetable vulnerabilities for difficult-to-treat tumors

Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review.

TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration.

Multi-target QSAR modelling of chemo-genomic data analysis based on Extreme Learning Machine

Predicting kinase inhibitors using bioactivity matrix derived informer sets.

Predicting Drug Interactions From Chemogenomics Using INDIGO.

QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery.

GCDB: a glaucomatous chemogenomics database for in silico drug discovery.

Core Statistical Methods for Chemogenomic Data.

A Survey of Web-Based Chemogenomic Data Resources.

Integrative cancer pharmacogenomics to establish drug mechanism of action: drug repurposing.

ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics