Nontargeted breath analysis in real time using high-resolution mass spectrometry (HRMS) is a promising approach for high coverage profiling of metabolites in human exhaled breath. However, the information-rich and unique non-Gaussian metabolic signal shapes of real-time HRMS-based data pose a significant challenge for efficient data processing. This work takes a typical real-time HRMS technique as an example, i.e. secondary electrospray ionization high-resolution mass spectrometry (SESI-HRMS), and presents BreathXplorer, an open-source Python package designed for the processing of real-time exhaled breath data comprising multiple exhalations. BreathXplorer is composed of four main modules. The first module applies either a topological algorithm or a Gaussian mixture model (GMM) to determine the start and end points of each exhalation. Next, density-based spatial clustering of applications with noise (DBSCAN) is employed to cluster m/z values belonging to the same metabolic feature, followed by applying an intensity relative standard deviation (RSD) filter to extract real breath metabolic features. BreathXplorer also offers functions of (1) feature alignment across the samples and (2) associating MS/MS spectra with their corresponding metabolic features for downstream compound annotation. Manual inspection of the metabolic features extracted from SESI-HRMS breath data suggests that BreathXplorer can achieve 100% accuracy in identifying the start and end points of each exhalation and acquire accurate quantitative measurements of each breath feature. In a proof-of-concept study on exercise breathomics using SESI-HRMS, BreathXplorer successfully reveals the significantly changed metabolites that are pertinent to exercise. BreathXplorer is publicly available on GitHub (https://github.com/HuanLab/breathXplorer). It provides a powerful and convenient-to-use tool for the researchers to process breathomics data obtained by directly analysis using HRMS.
Read full abstract