Abstract

Next-generation sequencing (NGS) has enabled analysis of rare and uncommon variants in large study cohorts. A common strategy to overcome these low frequencies and/or small effect sizes relies on collapsing strategies, i.e. to bin variants within genes/regions. Several tools are now available for advanced statistical analyses however, tools to perform basic tasks such as obtaining allelic counts within defined genetics boundaries are unavailable or require complex coding. GARCOM library, an open-source freely available package in R language, returns a matrix with allelic counts within defined genetic boundaries. GARCOM accepts input data in PLINK or VCF formats, with additional options to subset data for refined analyses.

Highlights

  • Genome-wide association studies (GWAS) have led to the identification of several genomic common variants associated with complex diseases,[1] yet missing heritability remains extensive

  • Power to identify statistically significant rare variants (RVs) decreases as the minor allele frequency decreases: an ideal method to overcome this limitation is to group RV at the gene/region level, usually via a collapsing test

  • Genetics data were recoded using PLINK --recode A flag. On both chromosomes we found increased memory consumption and time (Figure 2) as we increased the number of individuals processed

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have led to the identification of several genomic common variants associated with complex diseases,[1] yet missing heritability remains extensive. Rapid decline in sequencing costs have enabled in-depth analysis of rare variants (RVs; minor allele frequency < 1%) through Whole-Genome sequencing (WGS) and Whole-Exome Sequencing (WES). Large-scale reference panels have allowed for imputation of RVs.[3,4,5] Power to identify statistically significant RVs decreases as the minor allele frequency decreases: an ideal method to overcome this limitation is to group RV at the gene/region level, usually via a collapsing test. Despite the availability of sophisticated tools for annotation, quality-control and association analyses, tools to perform basic tasks, for instance, obtaining allelic count within defined genetic boundaries (genes and/or regions) are lacking, to our knowledge. R libraries such as BEDMatrix and bigsnpr[6] provide allelic counts for each SNP per individual but algorithms to extract information within genetic boundaries in a collapsed fashion are unavailable. We introduce a user-friendly R package, GARCOM (“Genetic And Regional Count of Mutations”) that provides allelic counts per individual within user-provided genetics/regional boundaries

Methods
Discussion
Gibson G
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call