Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data

Nikola Štefelová,Karel Hron,Javier Palarea‐Albaladejo

doi:10.1002/sam.11514

Abstract

AbstractHigh‐throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well‐established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log‐ratios between the components of the mixture. The (log‐ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log‐ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high‐dimensional context, the aggregated log‐ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log‐ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high‐throughput compositional data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data

Abstract

Talk to us

Similar Papers

More From: Statistical Analysis and Data Mining: The ASA Data Science Journal

Lead the way for us

Journal: Statistical Analysis and Data Mining: The ASA Data Science Journal	Publication Date: May 19, 2021
Citations: 8

Similar Papers

Logcontrast PLS discriminant model of compositional data
Meng Jie
-
Meng Jie Meng Jie
01 Jun 2009
01 Jun 2009

Computational performance and cross‐validation error precision of five PLS algorithms using designed and real data sets
João Paulo A Martins ... Reinaldo F Teófilo
Journal of Chemometrics | VOL. 24
João Paulo A Martins, et. al.João Paulo A Martins ... Reinaldo F Teófilo
13 Apr 2010
Journal of Chemometrics | VOL. 24

Regression Modelling Analysis on Compositional Data
Huiwen Wang ... Michel Tenenhaus
-
Huiwen Wang, et. al.Huiwen Wang ... Michel Tenenhaus
16 Nov 2009
16 Nov 2009

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data

Abstract

Talk to us

Similar Papers

More From: Statistical Analysis and Data Mining: The ASA Data Science Journal