Abstract
Many biological high-throughput datasets, such as targeted amplicon-based and metagenomic sequencing data, are compositional. A common exploratory data analysis task is to infer robust statistical associations between high-dimensional microbial compositions and habitat- or host-related covariates. To address this, a general robust statistical regression framework RobRegCC (Robust Regression with Compositional Covariates) is proposed, which extends the linear log-contrast model by a mean shift formulation for capturing outliers. RobRegCC includes sparsity-promoting convex and non-convex penalties for parsimonious model estimation, a data-driven robust initialization procedure, and a novel robust cross-validation model selection scheme. The procedure is implemented in the R package robregcc. Extensive simulation studies show the RobRegCC's ability to perform simultaneous sparse log-contrast regression and outlier detection over a wide range of settings. To demonstrate the seamless applicability of the workflow to real data, the gut microbiome dataset from HIV patients are analyzed and robust associations between a sparse set of microbial species and host immune response from soluble CD14 measurements are inferred.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.