Generalizability of polygenic prediction models: how is theR2defined on test data?

Christian Staerk,Tobias Wistuba,Hannah Klinkhammer,Andreas Mayr,Carlo Maj

doi:10.1186/s12920-024-01905-8

Abstract

Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the is a commonly used measure to evaluate prediction accuracy. While the is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. Based on large-scale genotype data from the UK Biobank, we compare three definitions of the on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoproteinA, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. Our analysis shows that the choice of the definition can lead to considerably different results on test data, making the comparison of values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the based on the mean squared prediction error(MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of can provide valuable complementary information. Awareness of the different definitions of the on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.

Full Text