Untargeted analysis of comprehensive two-dimensional (2D) gas chromatography time-of-flight mass spectrometry (GC×GC-TOFMS) data has the potential to be hindered by run-to-run retention time shifting. To address this challenge, tile-based Fisher ratio (F-ratio) analysis (FRA) has been developed, which utilizes a supervised, untargeted approach involving a chromatographic segmentation routine termed "tiling" combined with the ANOVA F-ratio statistic to discover class-distinguishing analytes while minimizing false positives arising from shifting. The tiling algorithm is designed to account for retention shifting in both separation dimensions. Although applications of FRA have been reported, there remains a need to thoroughly evaluate the robustness of FRA for different levels of run-to-run retention shifting in order to broaden the scope of its application. To this end, a novel method of simulating GC×GC-TOFMS chromatograms with realistic run-to-run shifting is presented by random generation of low-frequency "shift functions". The dimensionless retention-time precision, <δr>, which is four times the standard deviation in retention time normalized to the peak width-at-base is used as a key modeling variable along with the 2D chromatographic saturation, αe,2D, and within-class relative standard deviation in peak area, RSDwc. We demonstrate that all three of these variables operate together to impact true positive discovery. To quantify the "success" of true positive discovery, GC×GC-TOFMS datasets for various combinations of <δr>, αe,2D, and RSDwc were simulated and then analyzed by FRA using a wide range of relative tile areas (RTA), which is a dimensionless measure of tile size. Since each hit in the FRA hit list was known a priori as either a true or false positive based on the simulation inputs, receiver operating characteristic (ROC) curves were readily constructed. Then, the area under the ROC curve (AUROC) was used as a metric for discovery "success" for various combinations of the modeling variables. Based on the results of this study, recommendations for tile size selection and experimental design are provided, and further supported by comparison to previous tile-based FRA applications. For instance, values for <δr>, αe,2D, and RSDwc obtained from a GC×GC-TOFMS dataset of yeast metabolites suggested an optimum RTA of 6.25, corresponding closely to the RTA of 4.00 employed in the study, implying the simulation results obtained here can be generalized to real datasets.
Read full abstract