Abstract

In association studies, the combined effects of single nucleotide polymorphism(SNP)-SNP interactions and the problem of imbalanced data between cases and controls are frequently ignored. In the present study, we used an improved multifactor dimensionality reduction(MDR) approach namely MDR-ER to detect the high order SNP‑SNP interaction in an imbalanced breast cancer data set containing seven SNPs of chemokine CXCL12/CXCR4 pathway genes. Most individual SNPs were not significantly associated with breast cancer. After MDR‑ER analysis, six significant SNP‑SNP interaction models with seven genes (highest cross‑validation consistency, 10; classification error rates, 41.3‑21.0; and prediction error rates, 47.4‑55.3) were identified. CD4 and VEGFA genes were associated in a 2‑loci interaction model (classification error rate, 41.3; prediction error rate, 47.5; odds ratio(OR), 2.069; 95% bootstrap CI, 1.40‑2.90; P=1.71E‑04) and it also appeared in all the best 2‑7‑loci models. When the loci number increased, the classification error rates and P‑values decreased. The powers in 2‑7‑loci in all models were >0.9. The minimum classification error rate of the MDR‑ER‑generated model was shown with the 7‑loci interaction model (classification error rate, 21.0; OR=15.282; 95% bootstrap CI, 9.54‑23.87; P=4.03E‑31). In the epistasis network analysis, the overall effect with breast cancer susceptibility was identified and the SNP order of impact on breast cancer was identified as follows: CD4= VEGFA> KITLG> CXCL12> CCR7= MMP2> CXCR4. In conclusion, the MDR‑ER can effectively and correctly identify the best SNP‑SNP interaction models in an imbalanced data set for breast cancer cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call