Abstract

MotivationHigh-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximizes analytic power while minimizing noise from unspecified environmental factors.ResultsHere we introduce ‘soft windowing’, a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype–phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant P-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2082 mutant mouse lines. Our method is generalizable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources.Availability and implementationThe method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • High-throughput, large-scale phenotyping studies evaluate variables of an organism’s biological systems to examine the contribution of genetic and environmental factors to phenotypes

  • Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight

  • We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach

Read more

Summary

Introduction

High-throughput, large-scale phenotyping studies evaluate variables of an organism’s biological systems to examine the contribution of genetic and environmental factors to phenotypes. The IMPC has phenotyped over 148 000 knockouts and 43 000 control mice (data release 9.2, January 2019) across 12 research centres in 9 countries These centres adhere to a set of standardized phenotype assays defined in the International Mouse Phenotyping Resource of Standardised Screens (IMPReSS), and designed to measure over 200 parameters on each mouse. This approach is unsatisfactory for IMPC data as some mutant lines had enough experimental mice to measure in one batch, while others needed multiple batches over 18 months due to breeding difficulties or other factors This variation in time-frames can lead to a widely different number of controls being applied to an analysis, making it challenging to explore correlations between mutant lines. We demonstrate how to tune parameters and demonstrate the implementation of the soft windowing on the IMPC data

System and methods
Algorithm
Weight generating function
Windowing regression
Selection of the tuning parameters
Sensitivity analysis
Simulation study
Soft windowing as part of the IMPC statistics pipeline
Procedure name
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.