Liquid chromatography retention time (RT) prediction plays a crucial role in metabolite identification, a challenging and essential task in untargeted metabolomics. Accurate molecular representation is vital for reliable RT prediction. To address this, we propose a novel molecular representation learning framework, ABCoRT(Atom-Bond Co-learning for Retention Time prediction), designed for predicting metabolite retention times. Our model transforms molecular graphs into dual hypergraphs, enabling the collaborative updating of atomic and bond information within both molecular graphs and hypergraphs, thereby producing highly informative molecular representations. We evaluated ABCoRT on a large-scale Small Molecule Retention Time (SMRT) data set comprising 80,038 molecules. Our model achieved a mean absolute error (MAE) of 25.75 s and a mean relative error (MRE) of 3.24% after removing nonretained molecules. Additionally, we fine-tuned pretrained ABCoRT models on six additional data sets from PredRet, achieving the lowest MAEs on five of them. Additionally, in metabolite screening conducted on the MetaboBASE and RIKEN_PlaSM data sets from the MassBank of North America, ABCoRT demonstrates its capability to filter out 38.35 and 28.46% of candidate compounds, respectively.
Read full abstract