Abstract

Breast cancer is classified into five intrinsic subtypes, with differing treatment methods and prognoses. Therefore, accurate identification of subtypes from patient transcriptome data is essential. Many gene signatures, including PAM50, have been developed to classify breast cancer subtypes. However, existing gene selection methods do not utilize biological pathways. Gene signature selection using biological pathways can explain signature genes in terms of biological functions. Thus, we propose a probabilistic model for pathway-guided gene set selection using gene expression data. First, we defined gene and pathway factors based on gene expression and pathway activation levels, and calculated the posterior probability. Second, we adopted the prediction strength to guide gene set selection. Third, the gene set was selected using the posterior probability and prediction strength values. Finally, on evaluating the selected gene set, it was experimentally confirmed that our gene set performed better on classification tasks than the PAM50 gene set, a gene set produced by the XGBoost classifier, and a random gene set. Among the genes selected by our method, it was confirmed that the genes included in the cell cycle and circadian rhythm pathways showed different expression patterns for each breast cancer subtype. Our selected gene set exhibited biological significance in terms of pathway activation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call