Abstract

Breast cancer is a complex disease and its effective treatment needs affordable diagnosis and subtyping signatures. While the use of machine learning approach in clinical computation biology is still in its infancy, the prevalent approach in identifying molecular biomarkers remains to be screening of all biomarkers by differential expression analysis. Many of these attempts used miRNAs expression data in breast cancer and amounted to the multitude of differentially expressed miRNAs in this cancer; hence, the minimal set of miRNA biomarkers to classify breast cancer is yet to be identified.Availability of diverse and vast amount of cancer datasets like The Cancer Genome Atlas facilitated the molecular profiling of patients' tumors and introduced new challenges like clinical grade interpretations from big data. In this study, miRNA expression dataset of breast cancer patients from TCGA database was used to develop prediction models from which miRNA biomarkers were identified for diagnosis and molecular subtyping of this cancer. I took the advantage of interpretability of tree-based classification models to extract their rules and identify minimal set of biomarkers in this cancer.Empirical negative control miRNAs in breast cancer obtained and used to normalize the dataset. Tree-based machine learning models trained in my analysis used hsa-miR-139 with hsa-miR-183 to classify breast tumors from normal samples, and hsa-miR4728 with hsa-miR190b to further classify these tumors into three major subtypes of breast cancer. In addition to the proposed biomarkers, the most important miRNAs in breast cancer classification were also presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call