Abstract

e13538 Background: In clinical cancer prevention, polygenic score models (PGS) have focused on the aggregation of effects from multiple genetic variants. PGS models have the power to inform the risk of developing specific traits, such as cancer. The PGS Catalog is an open repository of published models, which currently has 573 models for various types of cancers. Despite all the metadata provided, selecting an appropriate model to use in clinical applications is a challenging task. Here we present a methodology and platform to support decision-making on the selection of those models. Methods: We developed a web-based tool to explore and compare different models available in the PGS catalog. First, we performed data cleaning of all available metadata and loaded them into our platform. Our tool shows the evaluation metrics by performance, development method, ancestry, and covariates. Then, hypothesis generation can happen through the exploration of our visualization tool. Finally, we present the results for all cancer models in a systematic approach to identify best practices to follow. We focused on breast, prostate, lung and skin cancer models, evaluated with three metrics: AUC, Odds Ratio (OR), and Nagelkerke's Pseudo R²; in European ancestry testing sets. Results: Until Jan 2023, there were 254, 126, 124, and 42 models evaluated on European individuals for breast, skin, prostate, and lung cancer traits. Their average AUCs were 0.62, 0.66, 0.66, and 0.6, respectively. Not all models were evaluated for AUC, but it was the most frequently used metric. Meanwhile, their average OR was 1.47, 1.71, 1.3, and 1.43, respectively; and their average pseudo R² was 0.03, 0.08, 0.03, and 0.03 respectively. Regarding covariates, age, sex and principal components were significant to improve the accuracy of models across all cancer types. The best performing models for breast cancer were SNPnet, Lasso, and PRS-CS; for prostate cancer were SNPnet, PRS-CS, and GWAS-based selection; for lung cancer SNPnet, LASSO, and Prunning/Clumping and Thresholding; and for skin cancer, were SNPnet and GWAS-based selection. Conclusions: Our PGS explorer allows the identification of relevant characteristics for model selection. Prostate cancer models showed the best performance, and should be considered for prospective clinical validation. Models developed using SNPnet, LASSO, GWAS, and PRS-CS showed the best performance, but promising but barely used methods (e.g. LDpred2, PRSice-2, Elastic Net), should also be considered. Most models evaluated were trained using European cohorts, which limits the ability to estimate the accuracy of these models in other ancestries. We recommend strengthening the collection and further sequencing of multi-ethnic ancestries. In the future, a fair evaluation pipeline could shed more light about PGS performance. PGS Explorer is publicly available to the overall scientific community in our website ( pgs.amphorahealth.com ).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call