Feature extraction is the most fundamental step when analyzing liquid chromatography-mass spectrometry (LC-MS) datasets. However, traditional methods require optimal parameter selections and re-optimization for different datasets, thus hindering efficient and objective large-scale data analysis. Pure ion chromatogram (PIC) is widely used because it avoids the peak splitting problem of the extracted ion chromatogram (EIC) and regions of interest (ROIs). Here, we developed a deep learning-based pure ion chromatogram method (DeepPIC) to find PICs using a customized U-Net from centroid mode data of LC-MS directly and automatically. A model was trained, validated, and tested on the Arabidopsis thaliana dataset with 200 input-label pairs. DeepPIC was integrated into KPIC2. The combination enables the entire processing pipeline from raw data to discriminant models for metabolomics datasets. The KPIC2 with DeepPIC was compared against other competing methods (XCMS, FeatureFinderMetabo, and peakonly) on the MM48, simulated MM48, and quantitative datasets. These comparisons showed that DeepPIC outperforms XCMS, FeatureFinderMetabo, and peakonly in recall rates and correlation with sample concentrations. Five datasets of different instruments and samples were used to evaluate the quality of PICs and the universal applicability of DeepPIC, and 95.12% of the found PICs could precisely match their manually labeled PICs. Therefore, KPIC2+DeepPIC is an automatic, practical, and off-the-shelf method to extract features from raw data directly, exceeding traditional methods with careful parameter tuning. It is publicly available at https://github.com/yuxuanliao/DeepPIC.
Read full abstract