In the initial stages of drug discovery or pre-clinical studies, understanding the metabolic stability of new molecules is crucial. Recently, research on pre-trained deep learning for molecular property prediction has been actively progressing, with various models being made open-source. However, most of these models rely on either 2D graph or 1D sequence for training, and the representation varies depending on the data format used. Consequently, combining multiple representations can broaden the scope of learning and may potentially be a manageable and most effective method to enhance performance.Therefore, we propose a novel hybrid model for predicting metabolic stability, which integrates representations from both graph-based and sequence-based models pre-trained for molecular features. This approach utilizes the combined strengths of 2D topological and 1D sequential information of molecules. HiMol, a graph-based graph neural network (GNN) model, and Molformer, a sequence-based Transformer model, were selected for integration, thus we named it HiMolformer. HiMolformer demonstrated superior performance compared to other models. We also focus on regression task for prediction with a empirical dataset from Korea Chemical Bank (KCB), comprising 3,498 molecules with mouse liver microsome (MLM) and human liver microsome (HLM) data obtained from actual metabolic reaction experiments. To the best of our knowledge, it is the first attempt to develop MLM and HLM prediction models using regression with a single SMILES input. The source code of this model is available at https://github.com/YUNSEOKWOO/HiMolformer.
Read full abstract