Comparative analysis of machine learning algorithms for biomedical text document classification: A case study on cancer-related publications

Ekrem Kucuk,Cihan Yetis,Zeynep Kucukakcali,Ipek Cicek

doi:10.5455/medscience.2023.10.209

Ekrem Kucuk, Cihan Yetis + Show 2 more

Open Access

https://doi.org/10.5455/medscience.2023.10.209

Copy DOI

Abstract

Biomedical text document classification is an essential task within Natural Language Processing (NLP), with applications ranging from sentiment analysis to authorship identification. Despite advancements in traditional machine-learning algorithms like Support Vector Machines (SVM) and Logistic Regression, challenges such as data sparsity and high dimensionality persist. Recent years have seen a surge in the use of deep learning models to mitigate these issues. This study aims to conduct a comparative analysis of various machine-learning algorithms for classifying biomedical text documents. The study employs the "Medical Text Dataset - Cancer Doc Classification" from Kaggle, comprising 7570 biomedical text documents labeled into three types of cancer (colon, lung, and thyroid). A preprocessing pipeline involving tokenization, stop-word removal, and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization is applied. Algorithms including Logistic Regression, SVM, and Multinomial Naive Bayes are evaluated through 5-fold cross-validation. Performance metrics like accuracy, precision, recall, F1 score, and area under the ROC curve (AUC ROC) are employed. Logistic Regression outperforms the other algorithms with an accuracy of 78.3% and an AUC ROC of 88.59%. SVM and Multinomial Naive Bayes follow with lower performance metrics. Hyperparameter tuning further enhances the performance of the algorithms, particularly Logistic Regression. The study makes a significant contribution to the field of biomedical text classification by systematically comparing machine-learning algorithms. Logistic Regression emerges as the most effective, emphasizing the importance of algorithm selection and hyperparameter tuning in machine learning applications within this domain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative analysis of machine learning algorithms for biomedical text document classification: A case study on cancer-related publications

Abstract

Talk to us

Similar Papers

More From: Medicine Science | International Medical Journal

Lead the way for us

Journal: Medicine Science \| International Medical Journal	Publication Date: Jan 1, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Comparative Analysis of Machine Learning Algorithms for Early Prediction of Parkinson’s Disorder based on Voice Features
C D Anisha ... N Arulanand
Journal of Artificial Intelligence and Capsule Networks | VOL. 4
C D Anisha, et. al. C D Anisha ... N Arulanand
18 Jan 2023
Journal of Artificial Intelligence and Capsule Networks | VOL. 4

Comparative Analysis of Machine Learning Algorithms for Heart Attack Prediction
Krish Nagaral ... Dr Renjith
International Journal For Multidisciplinary Research | VOL. 6
Krish Nagaral , et. al.Krish Nagaral ... Dr Renjith
05 Nov 2024
International Journal For Multidisciplinary Research | VOL. 6

A Comparative Analysis of Machine Learning Algorithms for Big Data Applications in Predictive Analytics
Prasad Raju ... Penmetsa Naveena Devi
International Journal of Scientific Research and Management (IJSRM) | VOL. 12
Prasad Raju, et. al. Prasad Raju ... Penmetsa Naveena Devi
27 Oct 2024
International Journal of Scientific Research and Management (IJSRM) | VOL. 12

Performance Analysis of Machine Learning Algorithms for Medical Datasets
Fahreddin Sadikoglu ... Deborah Amaka Ewuru
-
Fahreddin Sadikoglu, et. al.Fahreddin Sadikoglu ... Deborah Amaka Ewuru
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative analysis of machine learning algorithms for biomedical text document classification: A case study on cancer-related publications

Abstract

Talk to us

Similar Papers

More From: Medicine Science | International Medical Journal