Abstract

AbstractThe high throughput of omics data has raised the need for intelligent analytics methods and systems to reap full benefits from them. This work joins the numerous efforts in understanding the different mechanisms inherent to breast cancer (BC) and focuses on the BC subtyping problem using various omics data, namely mRNA, DNA methylation, and copy number variation (CNV). The cancer genome atlas (TCGA) portal was used to acquire the data. They represent various types of data characterized by a huge number of descriptive features compared to the very small number of instances. Furthermore, the datasets are highly imbalanced. This makes training machine learning methods, particularly deep learning a very challenging task. In this paper, the problem is cast as a classification problem and various learning models have been developed to identify the best predictive one in terms of some performance measures. The objective is twofold: identify the best performing approach and the most relevant omics data for BC subtypes prediction. Three issues have been addressed, namely reducing the dimensionality of the datasets, balancing the datasets and determining the best machine learning model. Multiple dimensionality reduction methods and classifiers have been considered and several combinations have been investigated, and a comprehensive experimental study has been conducted to identify the best approach. Very competitive and even better results have been obtained compared to state-of-the-art approaches with a significant drop in the number of features.KeywordsArtificial intelligenceMachine learningBreast cancer subtypingMulti-class classificationMulti-omics data

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.