Abstract

One of the most important challenges that face researchers in the medical field is the variety and the extraordinary mass of data types that can be generated even for a single patient. Machine learning models are commonly used to extract knowledge from medical data. However, most of these models have been built based on only a single data type, ignoring learning valuable information existing in other data types. In this paper, we seek to enhance the efficiency of machine learning models in knowledge extraction. We apply data integration via ensemble stacked supervised generalization method. We propose a framework which adopts feature selection techniques to improve the stacked models in order to achieve a better performance than state-of-the-art data integration models. We also consider getting information from multiple data sources using multiple machine learning models to explore data effectively as each data type has a suitable machine learning model. The proposed predictive model is applied to the METABRIC dataset with survival prediction of breast cancer at 2000 days of diagnosis. Our ensemble stacked generalization method achieves 80% prediction accuracy which represents a high-performance index relative to the other known ensemble methods. Also, we demonstrate that some enhanced feature selection methods are able to select relevant features that are close to the features selected by a biological expert from breast cancer pathways.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call