Information extraction, retrieval, and text categorization are only a few of the significant research fields covered by "bio medical text classification." This study examines many text categorization techniques utilised in practise, as well as their strengths and weaknesses, in order to improve knowledge of various information extraction opportunities in the field of data mining. We compiled a dataset with a focus on three categories: "Thyroid Cancer," "Lung Cancer," and "Colon Cancer." This paper presents an empirical study of a classifier. The investigation was carried out using biomedical literature benchmarks. Many metaheuristic algorithms are investigated, including genetic algorithms, particle swarm optimisation, firefly, cuckoo, and bat algorithms. In addition, the proposed multiple classifier system outperforms ensemble learning, ensemble pruning, and traditional classification methods. Based on the data, we forecast if it is Thyroid Cancer, Lung Cancer, or Colon Cancer using basic EDA, text preprocessing, and several models such as Logistic Regression, Decision Tree Classification, and Random Forest Classification.
Read full abstract