Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor.

Shachar Fite,Omri Nitecki,Zeev Gross

doi:10.1021/acs.jcim.1c00563

Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor.

Shachar Fite, Omri Nitecki + Show 1 more

https://doi.org/10.1021/acs.jcim.1c00563

Copy DOI

Journal: Journal of chemical information and computer sciences	Publication Date: Jun 28, 2021
Citations: 2

Affiliation: Technion – Israel Institute of Technology

#Small Training Sets #Benchmark Methodologies + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Custom tokenization dictionary (CUSTODI) is introduced as a novel way for tackling the problem of molecular representations, and especially the challenge of molecular property prediction. Herein, the motivational theory and the actual representation and model are presented and shown to have performance that is in line with benchmark methodologies. The uniqueness of CUSTODI is its applicability on small training sets and the developed theory suggests its possible use for a-priori estimation of future fit quality on any given dataset, regardless of the method used for fitting.

Full Text