Abstract

Advances in next-generation sequencing technologies have enabled the identification of multiple rare single nucleotide polymorphisms involved in diseases or traits. Several strategies for identifying rare variants that contribute to disease susceptibility have recently been proposed. An important feature of many of these statistical methods is the pooling or collapsing of multiple rare single nucleotide variants to achieve a reasonably high frequency and effect. However, if the pooled rare variants are associated with the trait in different directions, then the pooling may weaken the signal, thereby reducing its statistical power. In the present paper, we propose a backward support vector machine (BSVM)-based variant selection procedure to identify informative disease-associated rare variants. In the selection procedure, the rare variants are weighted and collapsed according to their positive or negative associations with the disease, which may be associated with common variants and rare variants with protective, deleterious, or neutral effects. This nonparametric variant selection procedure is able to account for confounding factors and can also be adopted in other regression frameworks. The results of a simulation study and a data example show that the proposed BSVM approach is more powerful than four other approaches under the considered scenarios, while maintaining valid type I errors.

Highlights

  • Common variants (CVs) that contribute to complex genetic diseases have been successfully identified from genomewide association studies (GWAS), only a portion of heritability is explained by the identified loci

  • The proposed risk measure (RM) weighting with the backward support vector machine (BSVM) method had the highest power among the five weightings under scenarios with risk variants or a mixture of risk and protective variants (Table 1)

  • We have proposed a novel data-adaptive BSVM-based selection procedure to identify a region with rare variants (RVs) associated with complex traits and individual variants in the disease-associated gene/ region

Read more

Summary

Introduction

Common variants (CVs) that contribute to complex genetic diseases have been successfully identified from genomewide association studies (GWAS), only a portion of heritability is explained by the identified loci. The ‘‘missing heritability’’ is widely believed to result from other genetic mechanisms, such as gene-gene interactions, epigenetics, and rare variants (RVs). It may be that much of the missing genetic component is due to gene variants that have relatively large effects but are too rare to be picked up by GWAS. In this case, rapid advances in nextgeneration sequencing technologies should enable substantial progress to be made in gene mapping. The statistical analysis of rare genetic variations is challenging. Because rare alleles are present in only a small number of patients, the traditional variant-by-variant approach is doomed to low power [1]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.