Gas chromatography-mass spectrometry (GC-MS) is one of the most important instruments for analyzing volatile organic compounds. However, the complexity of real samples and the limitations of chromatographic separation capabilities lead to coeluting compounds without ideal separation. In this study, a Transformer-based automatic resolution method (GCMSFormer) is proposed to resolve mass spectra from GC-MS peaks in an end-to-end manner, predicting the mass spectra of components directly from the raw overlapping peaks data. Furthermore, orthogonal projection resolution (OPR) was integrated into GCMSFormer to resolve minor components. The GCMSFormer model was trained, validated, and tested using 100,000 augmented data. It achieves 99.88% of the bilingual evaluation understudy (BLEU) value on the test set, significantly higher than the 97.68% BLEU value of the baseline sequence-to-sequence model long short-term memory (LSTM). GCMSFormer was also compared with two nondeep learning resolution tools (MZmine and AMDIS) and two deep learning resolution tools (PARAFAC2 with DL and MSHub/GNPS) on a real plant essential oil GC-MS data set. Their resolution results were compared on evaluation metrics, including the number of compounds resolved, mass spectral match score, correlation coefficient, explained variance, and resolution speed. The results demonstrate that GCMSFormer has better resolution performance, higher automation, and faster resolution speed. In summary, GCMSFormer is an end-to-end, fast, fully automatic, and accurate method for analyzing GC-MS data of complex samples.
Read full abstract