Abstract

Retention time prediction in high-performance liquid chromatography (HPLC) is the subject of many studies since it can improve the identification of unknown molecules in untargeted profiling using HPLC coupled with high-resolution mass spectrometry. Lots of approaches were developed for retention time prediction in liquid chromatography for a different number of molecules considering various molecular properties and machine learning algorithms. The recently built large retention time data set of standard compounds from the Metabolite and Chemical Entity Database (METLIN) allows researchers to create a model that can be used for retention time prediction of small molecules with wide varieties of structures and physicochemical properties. The ability to predict retention times using the largest data set was studied for different architectures of deep learning models that were trained on molecular fingerprints, and SMILES (string representation of a molecule) represented as one-hot matrices. The best result was achieved with a one-dimensional convolutional neural network (1D CNN) that uses SMILES as an input. The proposed model reached the mean absolute error and the median absolute error equal to 34.7 and 18.7 s, respectively, which outperformed the results previously obtained for this data set. The pre-trained 1D CNN on the METLIN SMRT data set was transferred on five other data sets to evaluate the generalization ability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.