Abstract

Next-generation sequencing technologies have opened up the possibility to sequence large samples of cases and controls to test for association with rare variants. To limit cost and increase sample sizes, data from controls could be used in multiple studies and might thus be generated on different sequencing platforms. This could pose some problems of comparability between cases and controls due to batch effects that could be confounding factors, leading to false-positive association signals. To limit batch effects and ensure comparability of datasets, stringent quality controls are required. We propose an integrative five-steps pipeline, RAVAQ, that (a) performs a specific three-step quality control taking into account the case-control status to ensure data comparability, (b) selects qualifying variants as defined by the user, and (c) performs rare variant association tests per genomic region. The RAVAQ pipeline is wrapped in an R package. It is user-friendly and flexible in its arguments to adapt to the specificity of each research project. We provide examples showing how RAVAQ improves rare variant association tests. The default RAVAQ quality control outperformed the widely used Variant Quality Score Recalibration method, removing inflation due to spurious signals. RAVAQ is open source and freely available at https://gitlab.com/gmarenne/ravaq.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call