Abstract

Low-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly uncertain; therefore, the uncertain genotypes are usually removed or imputed before performing a statistical analysis. It may result in the inflated type I error rate and in a loss of statistical power. In this paper, we propose a mixture-based penalized score association test adjusting for non-genetic covariates. The proposed score test statistic is based on a sandwich variance estimator so that it is robust under the model misspecification between the covariates and the latent genotypes. The proposed method takes advantage of not requiring either external imputation or elimination of uncertain genotypes. The results of our simulation study show that the type I error rates are well controlled and the proposed association test have reasonable statistical power. As an illustration, we apply our statistic to pharmacogenomics data for drug responsiveness among 400 epilepsy patients.

Highlights

  • Genome-wide association study (GWAS) is a powerful tool for screening a high-dimensional genome data set and selecting candidate genetic variants such as single nucleotide polymorphisms (SNPs) in genetic association studies

  • Many Next-generation sequencing (NGS) experiments are based on low-coverage sequencing with a large sized sample since there is a trade-off between sample size and sequencing depth in the NGS experiments [9,10]

  • We developed the mixture-based genetic association tests adjusting the effects of non-genetic covariates in low-coverage NGS data

Read more

Summary

Introduction

Genome-wide association study (GWAS) is a powerful tool for screening a high-dimensional genome data set and selecting candidate genetic variants such as single nucleotide polymorphisms (SNPs) in genetic association studies. There have been numerous studies using NGS data such as rare variants association study [1,2], pharmacogenomics [3,4], machine learning and deep learning applications [5,6], and big data analysis [7,8]. Many NGS experiments are based on low-coverage sequencing with a large sized sample since there is a trade-off between sample size and sequencing depth in the NGS experiments [9,10]. For the low-coverage NGS data, a high uncertainty of the inferred genotypes is common; it causes biased and unreliable results on genetic association analyses. In genetic research based on NGS data, it is important to obtain accurate genotypes to perform an association analysis

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call