Abstract
Natural Language Processing has become an important issue with the rapid increase in textual data in the health sector recently. Especially with the effect of COVID-19, easy and fast analysis of health data is important for research. Traditional text representations such as BoW (bag of words), TF-IDF (term frequency-inverse document frequency), and modern word representation methods such as FastText and BERT are used to represent words. The BERT models are provided high performance recently. The BERT models are divided into pre-trained and fine-tuned BERT models. In order to get good results in the field of health, BioBERT models are obtained by fine-tuning the basic BERT models with datasets containing biomedical articles. In this study, semantic similarities in datasets are evaluated by the Pearson correlation method by using BoW, TF-IDF, FastText, BERT, and BioBERT models. As a result of the evaluations, it was observed that BioBERT models gave higher values compared to other models and methods used.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have