Abstract

Breast cancer (BC) is a multifactorial disease and the most common cancer in women worldwide. We describe a machine learning approach to identify a combination of interacting genetic variants (SNPs) and demographic risk factors for BC, especially factors related to both familial history (Group 1) and oestrogen metabolism (Group 2), for predicting BC risk. This approach identifies the best combinations of interacting genetic and demographic risk factors that yield the highest BC risk prediction accuracy. In tests on the Kuopio Breast Cancer Project (KBCP) dataset, our approach achieves a mean average precision (mAP) of 77.78 in predicting BC risk by using interacting genetic and Group 1 features, which is better than the mAPs of 74.19 and 73.65 achieved using only Group 1 features and interacting SNPs, respectively. Similarly, using interacting genetic and Group 2 features yields a mAP of 78.00, which outperforms the system based on only Group 2 features, which has a mAP of 72.57. Furthermore, the gene interaction maps built from genes associated with SNPs that interact with demographic risk factors indicate important BC-related biological entities, such as angiogenesis, apoptosis and oestrogen-related networks. The results also show that demographic risk factors are individually more important than genetic variants in predicting BC risk.

Highlights

  • Breast cancer (BC) is a multifactorial disease and the most common cancer in women worldwide

  • We extend our previous achievements in[21] by combining networks of interacting genetic variants with demographic risk factors for BC, in the form of risk factors related to both familial history and oestrogen metabolism

  • New discoveries in recent years have identified a number of risk factors contributing to BC risk, ranging from the genetic variants identified in genome-wide association studies (GWASs) to BC risk factors related to familial history and oestrogen metabolism

Read more

Summary

Introduction

Breast cancer (BC) is a multifactorial disease and the most common cancer in women worldwide. One should note that these studies are often based on a limited number of predictor variables and conventional regression models, which might make the estimates imprecise when working with potential multicollinearity in high-dimensional medical data, such as in genetic variants[20] To address this knowledge gap, in this study, we adopt our ML approach previously published in[21], which is built on an extreme gradient tree boosting (XGBoost) model[22] followed by adaptive iterative feature selection, to capture optimal networks of interacting features (genetic variants and demographic risk factors for BC) in a BC risk prediction task. Factors associated with elevated levels of oestrogen throughout a woman’s lifetime, such as exposure to oestrogen over long periods of time and early onset of menstruation, have been associated with an increased risk of BC25

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call