Abstract

Being able to predict the activity of chemical compounds against a drug target (e.g. a protein) in the preliminary stages of drug development is critical. In drug discovery, this is known as Quantitative Structure Activity Relationships (QSARs). Datasets for QSARs are often ill-posed for traditional machine learning to provide meaningful insights (e.g. very high dimensionality). Here, we propose a multi-task learning (MTL) approach to enrich the original QSAR datasets with the hope of improving overall QSAR performance. The proposed approach, henceforth named MTL-AT, increases the size of the useable data by the use of an assistant task: a supplementary dataset formed by compounds automatically extracted from other possibly related tasks. The main novelty in our MTL-AT approach is the addition of control for data leakage. We tested MTL-AT in two drug discovery scenarios: 1) using 100 unrelated QSAR datasets, and 2) using 20 QSAR datasets that are related to the same protein family. Results were compared against equivalent single-task approach (STL). MTL-AT outperformed STL in 45 tasks of scenario 1, and in 12 tasks of scenario 2. The best overall method appears to be MTL-AT on both scenarios, with the small datasets yielded the best performance improvement from using multi-task learning. These results show that implementing multi-task learning with QSAR data has promise, but more investigation is required to test its applicability to certain features in datasets to make it suitable for widespread use in the drug discovery area. To the best of our knowledge, this is the first study that benchmarks the use of MTL on a large number of small datasets, which represents a more realistic scenario in drug development.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.