Abstract
Currently, cancer is a worldwide public health problem. Machine and deep learning techniques hold great promise in healthcare by analyzing Electronic Health Records (EHR) that contain a large collection of structured and unstructured data. However, most research has been done with structured data, and valuable data is also found in doctor’s plain-text notes. Thus, this paper proposes an approach to classify breast, liver, and lung cancer based on structured and unstructured data obtained from the MIMIC-II clinical database by using machine and deep learning techniques. In particular, the Paragraph Vector algorithm is used as a deep learning approach to text representation. The goal of this work is to help physicians in early diagnosis of cancer. The proposed approach was tested on a balanced dataset of breast, liver, and lung cancer patient records. Pre-processing is done with structured and unstructured data, and the result is used as input variables to three machine learning models: Support Vector Machines, Multi Layer Perceptron, and Adaboost-SAMME. Then, the scoring metrics for these models are calculated in different training data configurations to choose the best performing model for classification. Results show that the best performing model was obtained with MLP, achieving 89% precision using unstructured data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.