Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls

Hamid Behravan,Arto Mannermaa,Katri Pylkäs,Robert Winqvist,Veli–Matti Kosma,Jaana M Hartikainen,Maria Tengström

doi:10.1038/s41598-018-31573-5

Hamid Behravan, Arto Mannermaa + Show 5 more

Open Access

https://doi.org/10.1038/s41598-018-31573-5

Copy DOI

Abstract

We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER−) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis.

Highlights

We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs
We propose a novel machine learning approach to identify group of interacting SNPs, which contribute most to the BC risk
We have developed a simple yet effective machine learning based approach to identify group of interacting SNPs, which contribute most to the BC risk

Summary

Introduction

We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We use the optimal hyperparameter values and the SNPs identified from the KBCP data to predict the OBCS cases and controls as a validation study in 10 repetitions of 5-fold CV.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Sep 3, 2018
Citations: 63	License type: open-access

R Discovery Prime

R Discovery Prime

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Abstract 2790: Gene-based analysis of the fibroblast growth factor receptor signaling pathway identifies an association of the FGF1 gene with risk of estrogen receptor-negative breast cancer: The AMBER consortium
Edward A Ruiz-Narvaez ... Christopher A Haiman
Cancer Research | VOL. 75
Edward A Ruiz-Narvaez, et. al.Edward A Ruiz-Narvaez ... Christopher A Haiman
01 Aug 2015
Cancer Research | VOL. 75

Abstract 2794: Genetic variations in vitamin D-related pathways and breast cancer risk in African American women
Song Yao ... Qianqian Zhu
Cancer Research | VOL. 75
Song Yao, et. al.Song Yao ... Qianqian Zhu
01 Aug 2015
Abstract 2794: Genetic variations in vitamin D-related pathways and breast cancer risk in African American women
Song Yao ... Qianqian Zhu

Evaluation of three polygenic risk score models for the prediction of breast cancer risk in Singapore Chinese.
Claire Hian Tzer Chan ...
Oncotarget | VOL. 9
Claire Hian Tzer Chan, et. al.Claire Hian Tzer Chan ...
31 Jan 2018
Oncotarget | VOL. 9

Breast cancer, menopause, and long-term survivorship: critical issues for the 21st century
Patricia A Ganz
The American Journal of Medicine | VOL. 118
Patricia A GanzPatricia A Ganz
01 Dec 2005
The American Journal of Medicine | VOL. 118

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports