Abstract

BackgroundRecent statistical methods for next generation sequencing (NGS) data have been successfully applied to identifying rare genetic variants associated with certain diseases. However, most commonly used methods (e.g., burden tests and variance-component tests) rely on large sample sizes. Notwithstanding, due to its-still high cost, NGS data is generally restricted to small sample sizes, that cannot be analyzed by most existing methods.MethodsIn this work, we propose a new exact association test for sequencing data that does not require a large sample approximation, which is applicable to both common and rare variants. Our method, based on the Generalized Cochran-Mantel-Haenszel (GCMH) statistic, was applied to NGS datasets from intraductal papillary mucinous neoplasm (IPMN) patients. IPMN is a unique pancreatic cancer subtype that can turn into an invasive and hard-to-treat metastatic disease.ResultsApplication of our method to IPMN data successfully identified susceptible genes associated with progression of IPMN to pancreatic cancer. ConclusionsOur method is expected to identify disease-associated genetic variants more successfully, and corresponding signal pathways, improving our understanding of specific disease’s etiology and prognosis.

Highlights

  • Recent statistical methods for generation sequencing (NGS) data have been successfully applied to identifying rare genetic variants associated with certain diseases

  • We demonstrate that our proposed Exact Association Test (EXAT) method can successfully identify susceptible genes associated with the progression of intraductal papillary mucinous neoplasm (IPMN) to pancreatic cancer

  • In this study, we proposed an association test, Exact Association Test (EXAT), for identifying rare variants, and assessed its performance against other methods of analyzing small sample-size datasets associated with the intraductal papillary mucinous neoplasm (IPMN) subtype of pancreatic cancer

Read more

Summary

Introduction

Recent statistical methods for generation sequencing (NGS) data have been successfully applied to identifying rare genetic variants associated with certain diseases. Notwithstanding, due to its-still high cost, NGS data is generally restricted to small sample sizes, that cannot be analyzed by most existing methods Many genetic studies, such as genome-wide association studies (GWAS), have successfully identified genetic variants associated with complex human traits and diseases [1]. The cohort allelic sum test (CAST) collapses genotypes across all variants, such that an individual is coded as 1, if a rare allele is present at any of the variant sites; otherwise, it is coded as 0 [6] This approach may not fully reflect the effect emerging from the complex ensemble of multiple rare variants, because it only uses the information from the presence of rare variants within a specific genomic region

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call