Abstract

This study investigates the automatic categorization of time period metadata in fiction, a critical but often overlooked aspect of cataloging. Using a comparative analysis approach, the performance of three machine learning techniques, namely Latent Dirichlet Allocation (LDA), Sentence-BERT (SBERT), and Term Frequency-Inverse Document Frequency (TF-IDF) were assessed, by examining their precision, recall, F1 scores, and confusion matrix results. LDA identifies underlying topics within the text, TF-IDF measures word importance, and SBERT measures sentence semantic similarity. Based on F1-score analysis and confusion matrix outcomes, TF-IDF and LDA effectively categorize text data by time period, while SBERT performed poorly across all time period categories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call