Traditional Term Frequency-Inverse Document Frequency Research Articles

Abstract Background and Aims Large language models (LLMs) have gained significant attention in the field of natural language processing (NLP), marking a shift from traditional techniques like Term Frequency-Inverse Document Frequency (TF-IDF). We developed a traditional NLP model to predict arteriovenous fistula (AVF) failure within next 30 days using clinical notes. The goal of this analysis was to investigate whether LLMs would outperform traditional NLP techniques, specifically in the context of predicting AVF failure within the next 30 days using clinical notes. Method We defined AVF failure as the change in status from active to permanently unusable status or temporarily unusable status. We used data from a large kidney care network from January 2021 to December 2021. Two models were created using LLMs and traditional TF-IDF technique. We used “distilbert-base-uncased”, a distilled version of BERT base model [1], and compared its performance with traditional TF-IDF-based NLP techniques. The dataset was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising of unseen patients’ data was used to evaluate the performance of the model. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity. Results The incidence of 30 days AVF failure rate was 2.3% in the population. Both LLMs and traditional showed similar overall performance as summarized in Table 1. Notably, LLMs showed marginally better performance in certain evaluation metrics. Both models had same AUROC of 0.64 on test data. The accuracy and balanced accuracy for LLMs were 72.9% and 59.7%, respectively, compared to 70.9% and 59.6% for the traditional TF-IDF approach. In terms of specificity, LLMs scored 73.2%, slightly higher than the 71.2% observed for traditional NLP methods. However, LLMs had a lower sensitivity of 46.1% compared to 48% for traditional NLP. However, it is worth noting that training on LLMs took considerably longer than TF-IDF. Moreover, it also used higher computational resources such as utilization of graphics processing units (GPU) instances in cloud-based services, leading to higher cost. Conclusion In our study, we discovered that advanced LLMs perform comparably to traditional TF-IDF modeling techniques in predicting the failure of AVF. Both models demonstrated identical AUROC. While specificity was higher in LLMs compared to traditional NLP, sensitivity was higher in traditional NLP compared to LLMs. LLM was fine-tuned with a limited dataset, which could have influenced its performance to be similar to that of traditional NLP methods. This finding suggests that while LLMs may excel in certain scenarios, such as performing in-depth sentiment analysis of patient data for complex tasks, their effectiveness is highly dependent on the specific use case. It is crucial to weigh the benefits against the resources required for LLMs, as they can be significantly more resource-intensive and costly compared to traditional TF-IDF methods. This highlights the importance of a use-case-driven approach in selecting the appropriate NLP technique for healthcare applications.

Abstract Purpose Online reviews on tourism attractions provide important references for potential tourists to choose tourism spots. The main goal of this study is conducting sentiment analysis to facilitate users comprehending the large scale of the reviews, based on the comments about Chinese attractions from Japanese tourism website 4Travel. Design/methodology/approach Different statistics- and rule-based methods are used to analyze the sentiment of the reviews. Three groups of novel statistics-based methods combining feature selection functions and the traditional term frequency-inverse document frequency (TF-IDF) method are proposed. We also make seven groups of different rules-based methods. The macro-average and micro-average values for the best classification results of the methods are calculated respectively and the performance of the methods are shown. Findings We compare the statistics-based and rule-based methods separately and compare the overall performance of the two method. According to the results, it is concluded that the combination of feature selection functions and weightings can strongly improve the overall performance. The emotional vocabulary in the field of tourism (EVT), kaomojis, negative and transitional words can notably improve the performance in all of three categories. The rule-based methods outperform the statistics-based ones with a narrow advantage. Research limitation Two limitations can be addressed: 1) the empirical studies to verify the validity of the proposed methods are only conducted on Japanese languages; and 2) the deep learning technology is not been incorporated in the methods. Practical implications The results help to elucidate the intrinsic characteristics of the Japanese language and the influence on sentiment analysis. These findings also provide practical usage guidelines within the field of sentiment analysis of Japanese online tourism reviews. Originality/value Our research is of practicability. Currently, there are no studies that focus on the sentiment analysis of Japanese reviews about Chinese attractions.

Traditional Term Frequency-Inverse Document Frequency Research Articles

Articles published on Traditional Term Frequency-Inverse Document Frequency

#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure

Self-admitted technical debt classification using natural language processing word embeddings

Evaluation of Average Term Occurrences Weighting Technique for Arabic Textual Information Retrieval

Deep learning-based approach for Arabic open domain question answering

Detecting Emotion in Indonesian Tweets: A Term-Weighting Scheme Study

Keyword Extraction from Scientific Research Projects Based on SRP‐TF‐IDF

Classification and analysis of literary works based on distribution weighted term frequency-inverse document frequency

Arabic Questions Classification Using Modified TF-IDF

Ontology Driven Social Big Data Analytics for Fog enabled Sentic-Social Governance

Special Issue on Recent Trends and Future of Fog and Edge Computing, Services and Enabling Technologies

Sentiment Analysis of Japanese Tourism Online Reviews

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Mimvec: a deep learning approach for analyzing the human phenome

Turning from TF-IDF to TF-IGM for term weighting in text classification

Method research of new event detection based on news element

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Traditional Term Frequency-Inverse Document Frequency Research Articles

Articles published on Traditional Term Frequency-Inverse Document Frequency

#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure

Self-admitted technical debt classification using natural language processing word embeddings

Evaluation of Average Term Occurrences Weighting Technique for Arabic Textual Information Retrieval

Deep learning-based approach for Arabic open domain question answering

Detecting Emotion in Indonesian Tweets: A Term-Weighting Scheme Study

Keyword Extraction from Scientific Research Projects Based on SRP‐TF‐IDF

Classification and analysis of literary works based on distribution weighted term frequency-inverse document frequency

Arabic Questions Classification Using Modified TF-IDF

Ontology Driven Social Big Data Analytics for Fog enabled Sentic-Social Governance

Special Issue on Recent Trends and Future of Fog and Edge Computing, Services and Enabling Technologies

Sentiment Analysis of Japanese Tourism Online Reviews

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Mimvec: a deep learning approach for analyzing the human phenome

Turning from TF-IDF to TF-IGM for term weighting in text classification

Method research of new event detection based on news element