568 Background: Recently, genomic features have proven effective at gene-agnostic detection of homologous recombination deficiency (HRD). However, it remains to be explored to what extent we can exploit genome analysis to assess HRD status. Here, we show that a machine learning (ML) classifier based on mutational signatures enables a robust detection and subtyping of HRD and outperforms a current state-of-the-art HRD classifier. Methods: We whole-genome-sequenced (WGS) ~700 breast cancers and identified pathogenic variants in HR-related genes ( BRCA1/2, PALB2, CHEK2, RAD51B, ATM, etc.). Mutational signatures of all variant types, single-base substitution (SBS), indel (ID), structural (SV) and copy-number variant (CN), were included as features. Biallelic inactivation in BRCA1/2 by germline pathogenic variant and somatic loss-of-heterozygosity (LOH) was regarded as true HRD. We trained four different classifiers with n-fold cross validation and used averaged scores for prediction. In addition, we performed cluster analysis and subgroup analysis within the predicted HRD cases to unveil the heterogeneity of HRD. Results: We identified a total 88 (12%) germline pathogenic variant carriers with somatic LOH, including 29 BRCA1 (4%), 28 BRCA2 (4%), 4 PALB2 (0.6%), and 3 RAD51B (0.4%). Among them, 70 (80%) were classified as HRD positives. As expected, most BRCA1 (28/29) and BRCA2 (28/28) cases were HRD-positive. All the PALB2 (4/4) and RAD51B (3/3) cases were also classified as HRD-positive, confirming their importance in HR. Inclusion of somatic RAD51B biallelic mutants (10; 1.4%) reached a statistical significance for the HRD enrichment (Fisher’s; P = 0.045). We identified a total 171 (24%) having at least one somatic pathogenic variant with LOH, including 57 PTEN (33%), 40 CDK12 (23%), 17 RAD51B (10%), 17 ARID1A (10%), and 15 BRIP1 (9%). 72 of these (42%) were HRD-positive. Our classifier identified 12 (7%) more HRD cases than HRDetect (167 vs. 155; F1 = 0.99 vs. 0.95) using signature ID6, SV3 (or RS3), LST, and SBS3. Further, it correctly identified BRCA1 (33/34) and BRCA2 (27/28) somatic and germline biallelic mutants from each other (F1 = 0.97), based on RS3, ID6, RS1, and SBS37. Features of RAD51B were similar to BRCA1, and PALB2 to BRCA2, but our classifier was able to distinguish PALB2 from BRCA2 using a combination of CN11, ID11, LOH, CN8, and SBS40 (F1 = ~1). Total 112 (67%), 48 (29%), and 7 (4%) cases were identified as BRCA1-like, BRCA2-like, and BRCA1/2-like, among which 47 (28%) were monoallelic HR-gene mutants and 27 (16%) were wild types. Our results suggest that mutational signatures can identify ~40% (47/120) more HRD cases than using mutation profile alone. Conclusions: We believe that the method developed here, using ML to detect and classify HRD, will benefit from larger WGS data and identify more patients who can benefit from HRD-related therapeutics.
Read full abstract