Large-Scale Distributed Training of Transformers for Chemical Fingerprinting.

Hisham Abdel-Aty,Ian R Gould

doi:10.1021/acs.jcim.2c00715

Hisham Abdel-Aty, Ian R Gould

Open Access

https://doi.org/10.1021/acs.jcim.2c00715

Copy DOI

Abstract

Transformer models have become a popular choice for various machine learning tasks due to their often outstanding performance. Recently, transformers have been used in chemistry for classifying reactions, reaction prediction, physiochemical property prediction, and more. These models require huge amounts of data and localized compute to train effectively. In this work, we demonstrate that these models can successfully be trained for chemical problems in a distributed manner across many computers—a more common scenario for chemistry institutions. We introduce MFBERT: Molecular Fingerprints through Bidirectional Encoder Representations from Transformers. We use distributed computing to pre-train a transformer model on one of the largest aggregate datasets in chemical literature and achieve state-of-the-art scores on a virtual screening benchmark for molecular fingerprints. We then fine-tune our model on smaller, more specific datasets to generate more targeted fingerprints and assess their quality. We utilize a SentencePiece tokenization model, where the whole procedure from raw molecular representation to molecular fingerprints becomes data-driven, with no explicit tokenization rules.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Chemical Information and Modeling	Publication Date: Oct 4, 2022
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large-Scale Distributed Training of Transformers for Chemical Fingerprinting.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Similar Papers

Identifying Discourse Elements in Writing by Longformer for NER Token Classification
Alia Alkabool ... Hani Mahfooz
Iraqi Journal for Electrical and Electronic Engineering | VOL. 19
Alia Alkabool, et. al.Alia Alkabool ... Hani Mahfooz
17 Feb 2023
Iraqi Journal for Electrical and Electronic Engineering | VOL. 19

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.
Mark Ormerod ... Jesús Martínez Del Rincón
JMIR Medical Informatics | VOL. 9
Mark Ormerod, et. al.Mark Ormerod ... Jesús Martínez Del Rincón
26 May 2021
JMIR Medical Informatics | VOL. 9

Machine Translation of English Language Using the Complexity-Reduced Transformer Model
Qin Li
Mobile Information Systems | VOL. 2022
Qin LiQin Li
07 Jun 2022
Mobile Information Systems | VOL. 2022

Digital Health Transformers and Opportunities for Artificial Intelligence-Enabled Nephrology.
Benjamin Shickel ... Tezcan Ozrazgat-Baslanti
Clinical Journal of the American Society of Nephrology | VOL. 18
Benjamin Shickel, et. al.Benjamin Shickel ... Tezcan Ozrazgat-Baslanti
09 Feb 2023
Clinical Journal of the American Society of Nephrology | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-Scale Distributed Training of Transformers for Chemical Fingerprinting.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling