Abstract

Simple multilinear methods, such as partial least squares regression (PLSR), are effective at interrelating dynamic, multivariate datasets of cell–molecular biology through high-dimensional arrays. However, data collected in vivo are more difficult, because animal-to-animal variability is often high, and each time-point measured is usually a terminal endpoint for that animal. Observations are further complicated by the nesting of cells within tissues or tissue sections, which themselves are nested within animals. Here, we introduce principled resampling strategies that preserve the tissue-animal hierarchy of individual replicates and compute the uncertainty of multidimensional decompositions applied to global averages. Using molecular–phenotypic data from the mouse aorta and colon, we find that interpretation of decomposed latent variables (LVs) changes when PLSR models are resampled. Lagging LVs, which statistically improve global-average models, are unstable in resampled iterations that preserve nesting relationships, arguing that these LVs should not be mined for biological insight. Interestingly, resampling is less discriminatory for multidimensional regressions of in vitro data, where replicate-to-replicate variance is sufficiently low. Our work illustrates the challenges and opportunities in translating systems-biology approaches from cultured cells to living organisms. Nested resampling adds a straightforward quality-control step for interpreting the robustness of in vivo regression models.

Highlights

  • Simple multilinear methods, such as partial least squares regression (PLSR), are effective at interrelating dynamic, multivariate datasets of cell–molecular biology through high-dimensional arrays

  • We sought an implementation of PLSR that robustly analyzes in vivo datasets comprised of temporal, multiparameter, and interrelated responses to perturbations

  • When applied to in vivo PLSR models, nested resampling is an effective way to hone in on latent variables that are robust to the replicate fluctuations of individual inbred animals

Read more

Summary

Introduction

Simple multilinear methods, such as partial least squares regression (PLSR), are effective at interrelating dynamic, multivariate datasets of cell–molecular biology through high-dimensional arrays. Modern biology and physiology demand rich, quantitative, time-resolved observations obtained by different methods[1] To analyze such datasets, statistical “data-driven” modeling[2] approaches have been productively deployed in vitro to examine network-level relationships between signal transduction and cell phenotype[3,4,5,6,7,8,9]. We apply computational statistics[34] to the construction and interpretation of in vivo PLSR models built from multidimensional arrays. Neither is especially informative at discriminating latent variables when applied to a highly reproducible[35] multidimensional dataset collected in vitro, bolstering the claims of earlier studies with cultured cells[3,4,5,6,7,8,9]. By leveraging the structure of multidimensional arrays, nested resampling provides a rapid numerical means to incorporate the uncertainty of in vivo observations into data-driven models without violating their mathematical assumptions

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.