Abstract

Modern comprehensive instrumentations provide an unprecedented coverage of complex matrices in the form of high-dimensional, information rich data sets. In addition to the usual biomarker research that focuses on the detection of the studied condition, we aimed to define a proper strategy to conduct a correlation analysis on an untargeted colorectal cancer case study with a data set of 102 variables corresponding to metabolites obtained from serum samples analyzed with comprehensive two-dimensional gas chromatography coupled to high-resolution time-of-flight mass spectrometry (GC × GC-HRTOF-MS). Indeed, the strength of association existing between the metabolites contains potentially valuable information about the molecular mechanisms involved and the underlying metabolic network associated to a global perturbation, at no additional analytical effort. Following Anscombe's quartet, we took particular attention to four main aspects. First, the presence of non-linear relationships through the comparison of parametric and non-parametric correlation coefficients: Pearson's r, Spearman's rho, Kendall's tau and Goodman-Kruskal's gamma. Second, the visual control of the detected associations through scatterplots and their associated regressions and angles. Third, the effect and handling of atypical samples and values. Fourth, the role of the precision of the data on the attribution of the ranks through the presence of ties. Kendall's tau was found the method of choice for the data set at hand. Its application highlighted 17 correlations significantly altered in the active state of colorectal cancer (CRC) in comparison to matched healthy controls (HC), from which 10 were specific to this state in comparison to the remission one (R-CRC) investigated on distinct patients. 15 metabolites involved in the correlations of interest, on the 25 unique ones obtained, were annotated (Metabolomics Standards Initiative level 2). The metabolites highlighted could be used to better understand the pathology. The systematic investigation of the methodological aspects that we expose allows to implement correlation analysis to various fields and many specific cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call