Abstract Interpretation of genetic variants plays an essential role in cancer and other diseases. Needs for variant interpretation span from basic research to informing profound clinical decisions. The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental or clinical characterizations by independent assessors. CAGI establishes objective reference standards on how well current computational methods do—and do not—meet clinical and research requirements. As such CAGI enables appropriate use of methods and promotes the development of improved approaches. There have been notable discoveries from each of the four CAGI experiments to date, and general themes have emerged. Some examples illustrating the themes: a challenge, in which participants had to blindly predict how missense variants in p16 affect proliferation in human cell lines, showed that individual variants prediction for even the top performing methods were not consistently accurate. The RAD50 challenge in which participants were asked to classify RAD50 variants in breast cancer cases and controls, along with other challenges, showed that missense methods tend to correlate better with each other than with experiment (for reasons that may reflect biases in the predictive methods but also in the experimental assays). Bespoke approaches often enhance performance, as seen for example in also the RAD50 challenge, where the use of the knowledge of which domains are involved in DNA repair resulted in more accurate performance. Results from a challenge in which participants had to predict whether p53 core domain mutations recue activity of inactive p53 showed that protein three-dimensional structure-based missense methods do well in a few cases, while sequence-based methods have more consistent performance on most challenges. A challenge in which predictors predicted the response of 54 breast cancer cell lines to a panel of cancer drugs showed that training the model with external data enabled to find the correct ballpark of drug sensitivity. Interpretation of non-coding variants shows promise but is not at the level of missense; one challenge showing this had predict the probability that variants in BRCA1 and BRCA2 collected by Free the Data are pathogenic. Two groups correctly predicted all the deletion and insertion mutations, but they were not able to correctly predict classification for all the intron mutations. CAGI findings suggest that running multiple uncalibrated methods and considering their consensus may result in undue confidence, making this procedure inadvisable. Citation Format: Gaia Andreoletti, Roger A. Hoskins, Susanna Repo, Daniel Barsky, Steven E. Brenner, John Mult, Cagi participants. CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction: implications for predicting impact of variants in cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 3295.
Read full abstract