Abstract

Many association studies analyze the genotype frequencies of case and control data to predict susceptibility to diseases and cancers. Without providing the raw data for genotypes, many association studies cannot be interpreted fully. Often, the interactions of the single nucleotide polymorphisms (SNPs) are not addressed and this limits the potential of such studies. To solve these problems, we propose a novel computational method with source codes to generate a stimulated genotype dataset based on published SNP genotype frequencies. In this study we evaluate the combined effect of 26 SNP combinations related to eight published growth factor-related genes involved in carcinogenesis pathways of breast cancer. The genetic algorithm (GA) was chosen to provide simultaneous analysis of multiple independent SNPs. The GA can perform feature selection from different SNP combinations via their corresponding genotype (called the SNP barcode), and the approach is able to provide a specific SNP barcode with an optimized fitness value effectively. The best SNP barcode with the maximal occurrence difference between groups for the control and breast cancer, together with an odds ratio analysis, is used to evaluate breast cancer susceptibility. When they are compared to their corresponding non-SNP barcodes, the estimated odds ratios for breast cancer are less than 1 (about 0.85 and 0.87; confidence interval: 0.7473∼0.9585, p < 0.01) for specific SNP barcodes with two to five SNPs. Therefore, we were able to identify potential combined growth factor-related genes together with their SNP barcodes that were protective against breast cancer by in silico analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call