Abstract

Several gene signatures have been identified to build predictors of chemosensitivity for breast cancer. It is crucial to understand how each gene in a signature contributes to the prediction, i.e., to make the prediction model interpretable instead of using it as a black box. We utilized Random Forests (RFs) to build two interpretable predictors of pathologic complete response (pCR) based on two gene signatures. One signature consisted of the top 31 probe sets (27 genes) differentially expressed between pCR and residual disease (RD) chosen from a previous study, and the other consisted of the genes involved in Notch singling pathway (113 genes). Both predictors had a higher accuracy (82% v 76% & 79% v 76%), a higher specificity (91% v 71% & 98% v 71%), and a higher positive predictive value (PPV) (68% v 52% & 73% v 52%)) than the predictor in the previous study. Furthermore, Random Forests were employed to calculate the importance of each gene in the two signatures. Findings of our functional annotation suggested that the important genes identified by the feature selection scheme of Random Forests are of biological significance.

Highlights

  • Breast cancer is a clinically heterogeneous disease that demonstrates a wide variation in its clinical courses and response to chemotherapy

  • In a significant proportion of breast cancer patients, chemotherapy does not result in response, but can induce significant side effects and financial costs

  • We sought to explore the utility of Random Forests were utilized to construct two predictors based on two signatures, the top 31 probe sets and the Notch signature, and take advantage of the feature selection capability of Random Forests to measure the importance of each gene in these signatures

Read more

Summary

INTRODUCTION

Breast cancer is a clinically heterogeneous disease that demonstrates a wide variation in its clinical courses and response to chemotherapy. A 30-probe set Diagonal Linear Discriminant Analysis (DLDA-30) classifier was selected for independent validation. It showed a significantly higher sensitivity (92% v 61%) than a clinical predictor including age, grade, and estrogen receptor status. This 30-probe set pharmacogenomic predictor correctly identified all but one of the patients who achieved pCR (12 of 13 patients) and all but one of those who were predicted to have residual disease had residual cancer (27 of 28 patients). We sought to explore the utility of Random Forests were utilized to construct two predictors based on two signatures, the top 31 probe sets and the Notch signature, and take advantage of the feature selection capability of Random Forests to measure the importance of each gene in these signatures

Patient Cohorts and Clinical Information
Top 31 Probe Set Signature
Notch Signature
False Discovery Rate
Random Forests
Feature Selection Using Random Forests
RESULTS
Importance of the Genes in Top 31 Probe Sets
Importance of the Genes in Notch Signature
CONLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.