Abstract

The All of Us (AoU) Research Program provides a comprehensive genomic dataset to accelerate health research and medical breakthroughs. Despite its potential, researchers face significant challenges, including high costs and inefficiencies associated with data extraction and analysis. AoUPRS addresses these challenges by offering a versatile and cost-effective tool for calculating polygenic risk scores (PRS), enabling both experienced and novice researchers to leverage the AoU dataset for significant genomic discoveries. AoUPRS is implemented in Python and utilizes the Hail framework for genomic data analysis. It offers two distinct approaches for PRS calculation: the Hail MatrixTable (MT) and the Hail Variant Dataset (VDS). The MT approach provides a dense representation of genotype data, while the VDS approach offers a sparse representation, significantly reducing computational costs. In performance evaluations, the VDS approach demonstrated a cost reduction of up to 99.51% for smaller scores and 85% for larger scores compared to the MT approach. Both approaches yielded similar predictive power, as shown by logistic regression analyses of PRS for coronary artery disease, atrial fibrillation, and type 2 diabetes. The empirical cumulative distribution functions (ECDFs) for PRS values further confirmed the consistency between the two methods. AoUPRS is a versatile and cost-effective tool that addresses the high costs and inefficiencies associated with PRS calculations using the AoU dataset. By offering both dense and sparse data processing approaches, AoUPRS allows researchers to choose the approach best suited to their needs, facilitating genomic discoveries. The tool's open-source availability on GitHub, coupled with detailed documentation and tutorials, ensures accessibility and ease of use for the scientific community.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.