Coherent Point Drift Peak Alignment Algorithms Using Distance and Similarity Measures for Two-Dimensional Gas Chromatography Mass Spectrometry Data.

Zeyu Li,Zichun Zhong,Sikai Zhong,Xiang Zhang,Seongho Kim,Ikuko Kato

doi:10.1002/cem.3236

Abstract

The peak alignment is a vital preprocessing step before downstream analysis, such as biomarker discovery and pathway analysis, for two-dimensional gas chromatography mass spectrometry (2DGCMS)-based metabolomics data. Due to uncontrollable experimental conditions, e.g., the differences in temperature or pressure, matrix effects on samples, and stationary phase degradation, a shift of retention times among samples inevitably occurs during 2DGCMS experiments, making it difficult to align peaks. Various peak alignment algorithms have been developed to correct retention time shifts for homogeneous, heterogeneous or both type of mass spectrometry data. However, almost all existing algorithms have been focused on a local alignment and are suffering from low accuracy especially when aligning dense biological data with many peaks. We have developed four global peak alignment (GPA) algorithms using coherent point drift (CPD) point matching algorithms: retention time-based CPD-GPA (RT), prior CPD-GPA (P), mixture CPD-GPA (M), and prior mixture CPD-GPA (P+M). The method RT performs the peak alignment based only on the retention time distance, while the methods P, M, and P+M carry out the peak alignment using both the retention time distance and mass spectral similarity. The method P incorporates the mass spectral similarity through prior information and the methods M and P+M use the mixture distance measure. Four developed algorithms are applied to homogeneous and heterogeneous spiked-in data as well as two real biological data and compared with three existing algorithms, mSPA, SWPA, and BiPACE-2D. The results show that our CPD-GPA algorithms perform better than all existing algorithms in terms of F1 score.

Full Text