Figure 1. Data processing thresholds and missed biological insights. A) all peaks that were generated by the MS-DIAL software for the MTBLS1684 study of 499 cord blood samples B) only peaks that were found to be significantly correlated (Spearman coefficient) with birth weight at FDR threshold of 0.05. Mass of proton (1.00784) was added to the neutral mass of peaks reported in the original peak list so it could be compared against the MS-DIAL peak-list. A retention time threshold of 0.05 minute and a mass accuracy threshold of 0.01 Da was used to find if a peak from the original peak list was present in the MS-DIAL generated peak list. Full code is available at https://colab.research.google.com/drive/1eV2ywgLtg0RyJ9qzuVl45KGWlmD0JxPy#scrollTo=c38d63Y8pevi . Out of 4,712 peaks from the original peak list, 645 had zero, 3,885 had one, 178 had two and four peaks had three matching peaks in the MS-DIAL data matrix, so a total of 4,253 peaks (4,253/63,393, ~7%) from the MS-DIAL data matrix were covered by the original peak list. Out of 151 significantly associated peaks (B) from the original peak list, 68 had zero, 77 had one and 6 peaks had two matching peaks among the 623 significantly associated peaks from the MS-DIAL data matrix (89/623, ~14% coverage).