Genomic sequencing is increasingly used in managing patients with cancer. Interpretation of somatic variants and their pathogenicity is often complex. Pathogenicity prediction tools are commonly used as part of the expert interpretation of somatic variants, but most of these tools were initially developed for germline variants. Our aim was to benchmark their performance for somatic variants. A gold standardlist was assembled of 4319 somatic single-nucleotide variants, classified as oncogenic (n = 2996) or neutral (n = 1323), based on their presence in curated databases or on their allele frequency in the general population. These variants were annotated with the most commonly used prediction tools [Database for Nonsynonymous SNPs' Functional Predictions (dbNSFP) and Universal Mutation Database Predictor (UMD-Predictor)] and computed performance calculations. Stratification of the prediction tools based on Matthews correlation coefficient and area under the receiver operating characteristic curve allowed the identification of the top-performing ones, namely, Combined Annotation-Dependent Depletion (CADD), Eigen or Eigen Principal Components (Eigen-PC), Polymorphism Phenotyping version 2 (PolyPhen-2), Protein Variation Effect Analyzer (PROVEAN), UMD-Predictor, and Rare Exome Variant Ensemble Learner (REVEL). Interestingly, Sorting Intolerant From Tolerant (SIFT), which is a commonly used prediction tool for somatic variants, was ranked in the second performance category. Combining tools two by two only marginally improved performances, mainly because of the occurrence of discordant predictions.
Read full abstract