Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.

Highlights

  • The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing

  • We focused on nineteen sets of genes, and an additional set of regulatory non-coding elements to investigate whether the association signals from WGScan are enriched in those sets

  • WGScan has several important advantages, namely (1) it scans for association in both coding and non-coding regions using multiple window sizes and multiple functional annotations, while adjusting for the correlation among test statistics, (2) it builds on existing sequence-based association tests such as burden and SKAT, (3) it only needs summary statistics so it can be applied to the metaanalysis of multiple data sets, and (4) it is computationally efficient

Read more

Summary

Introduction

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes. Several large-scale whole-genome sequencing projects are ongoing, including the NHLBI Trans-Omics for Precision Medicine program (TopMed), the NHGRI Genome Sequencing Program (GSP), the UK Biobank and the Simons Simplex Collection[1,2,3,4] The analysis of such data sets is challenging, primarily because of the large number of rare variants in these studies, our inability to accurately predict the functional effects of variants in non-coding regions of the genome, and the lack of natural units (i.e., the analogue of genes in the coding regions) for testing. We perform enrichment analyses for the associated regions using several sets, including promoter regions, catalogued gene sets for various complex diseases and traits, enhancers predicted to regulate expression levels for genes in these sets and Gene Ontology (GO) cellular components

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.