Articles published on term-frequency-inverse-document-frequency
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
4076 Search results
Sort by Recency
- Research Article
- 10.55041/isjem05359
- Jan 22, 2026
- International Scientific Journal of Engineering and Management
- V Mageswari + 1 more
Abstract The rapid expansion of social media has transformed online communication, allowing users to share thoughts and opinions instantly. Along with these benefits, social platforms have also become a space where cyberbullying and abusive language spread quickly, causing serious emotional and psychological harm. Because social media content is large in volume, short in length, and highly informal, identifying harmful messages manually is both time-consuming and unreliable[1][2][6].This paper presents a machine learning–based system for detecting cyberbullying in social media text. The proposed approach processes user-generated content using Natural Language Processing techniques such as tokenization, stop-word removal, and text normalization[10][11]. The cleaned text is converted into numerical features using the TF–IDF method and classified using a Random Forest algorithm to determine whether the content is bullying or non-bullying[14][15]. A Flask-based web application is developed to provide real-time prediction through a simple and user-friendly interface[12]. The results show that the system can effectively identify harmful messages from short social media posts, making it a practical tool for improving online safety and supporting automated content moderation[1][5]. Keywords Cyberbullying detection, social media analysis, machine learning, natural language processing, TF–IDF, Random Forest, text classification.
- Research Article
- 10.59581/jkts-widyakarya.v4i1.5875
- Jan 20, 2026
- Jurnal Kendali Teknik dan Sains
- Achmad Faris Fadhlulah + 4 more
The Indonesia Smart Program (Program Indonesia Pintar/PIP) is a government initiative aimed at ensuring equal access to education for students from underprivileged families, including those at the junior high school (SMP) level. However, at the school level, the management of PIP recipient data still faces several challenges, particularly in data searching and utilization, due to the increasing volume of data and the use of simple or manual search methods. These conditions can lead to delays in obtaining information and reduce the accuracy of decision-making. Therefore, an effective information retrieval system is needed to manage and search PIP recipient data efficiently. This study aims to design and develop an Information Retrieval System for PIP recipient data at the junior high school level using the Term Frequency–Inverse Document Frequency (TF-IDF) method. The TF-IDF method is applied to assign weights to terms in each document, enabling the system to identify and rank documents based on their relevance to user queries. The test results show that the system is able to measure document relevance accurately, where documents D3 and D4 obtain the highest similarity value of 0.099586089 and are classified as highly relevant, while other documents show lower similarity values down to zero. These results are also supported by graphical visualization, which helps users compare relevance levels more clearly. Thus, the implementation of the TF-IDF method has proven to be effective in supporting accurate, efficient, and systematic searching and management of PIP recipient data at the junior high school level.
- Research Article
- 10.29408/jit.v9i1.33688
- Jan 20, 2026
- Infotek: Jurnal Informatika dan Teknologi
- Bayu Setiawan + 2 more
In the digital era, customer reviews play an important role in shaping a restaurant’s reputation and assessing the quality of its services. Feedback provided by customers through reviews can be utilized as a valuable source of information for service improvement. This study aims to analyze customer sentiment toward Warung Pedes Gemes restaurant using a sentiment analysis approach based on the Naive Bayes algorithm. Review data were collected through questionnaires and processed through several stages, including data preprocessing, weighting using Term Frequency–Inverse Document Frequency (TF-IDF), sentiment classification, and model evaluation using a confusion matrix. The results indicate that the majority of customer reviews were classified as positive, with 543 positive reviews and 22 negative reviews identified. The performance evaluation of the Naive Bayes model shows an accuracy, precision, recall, and F1-score of 100%. These findings demonstrate the effectiveness of the Naive Bayes algorithm in accurately classifying customer sentiment. Therefore, this study contributes to the application of data-driven sentiment analysis as a supporting tool for improving service quality in the culinary industry.
- Research Article
- 10.3390/nano16020134
- Jan 19, 2026
- Nanomaterials (Basel, Switzerland)
- Sung-Kwang Shin + 3 more
Objective: We analyzed nanoparticle regulation research to examine the evolution of regulatory frameworks, identify major thematic structures, and evaluate current challenges in the governance of rapidly advancing nanotechnologies. By drawing parallels with the historical development of radiation regulation, the study aimed to contextualize emerging regulatory strategies and derive lessons for future governance. Methods: A total of 9095 PubMed-indexed articles published between January 2015 and October 2025 were analyzed using text mining, keyword frequency analysis, and topic modeling. Preprocessed titles and abstracts were transformed into a TF-IDF (Term Frequency-Inverse Document Frequency) document-term matrix, and NMF (Non-negative Matrix Factorization) was applied to extract semantically coherent topics. Candidate topic numbers (K = 1-12) were evaluated using UMass coherence scores and qualitative interpretability criteria to determine the optimal topic structure. Results: Six major research topics were identified, spanning energy and sensor applications, metal oxide toxicity, antibacterial silver nanoparticles, cancer nano-therapy, and nanoparticle-enabled drug and mRNA delivery. Publication output increased markedly after 2019 with interdisciplinary journals driving much of the growth. Regulatory considerations were increasingly embedded within experimental and biomedical research, particularly in safety assessment and environmental impact analyses. Conclusions: Nanoparticle regulation matured into a dynamic multidisciplinary field. Regulatory efforts should prioritize adaptive, data-informed, and internationally harmonized frameworks that support innovation while ensuring human and environmental safety. These findings provide a data-driven overview of how regulatory thinking was evolved alongside scientific development and highlight areas where future governance efforts were most urgently needed.
- Research Article
- 10.71052/hkfb2025/kbiy2015
- Jan 15, 2026
- Hong Kong Financial Bulletin
- Mengfei Xiao + 4 more
This research draws on the verbatim transcript of the complete recording of the Asian Financial Forum opening session held on 26th January 2026 as primary qualitative material. Using close textual interpretation in combination with word frequency and word cloud statistics, sentiment polarity detection, and Term Frequency-Inverse Document Frequency (TF-IDF) with KMeans clustering, the study distills the meeting’s shared understandings, institutional orientations, and strategic signals relevant to firms. The findings indicate that participants framed global uncertainty as a structural and enduring condition and emphasised stabilising long-term expectations through institutional coordination and multilateral cooperation. They highlighted the reduction of fragmentation costs via rule alignment, risk sharing, and enhanced connectivity of cross border financial infrastructure. Hong Kong was positioned as an institutional connector, playing a pivotal interfacing role in the offshore renminbi system, cross market connectivity schemes, and the coordinated development of clearing, settlement, and warehousing networks linked to gold and other commodities. Multilateral development finance was redefined as a builder of confidence and order, providing a stabilising anchor through instrument innovation and co financing when private capital becomes more cautious, while supporting infrastructure investment and the green transition. Sentiment results and negative term patterns suggest that key operational constraints are concentrated in frictions related to clearing, repurchase agreements, taxation, and process costs. The clustering structure further reveals a layered discursive chain spanning policy impetus, cross border institutional linkages, and the operational mechanics of financial markets. On this basis, firms should strengthen governance transparency, verifiable compliance, and quantifiable risk management capabilities, and optimise cross border financing and collaboration through institutionalised channels to enhance resilience and sustainable growth under uncertainty.
- Research Article
- 10.1007/s44163-026-00842-y
- Jan 15, 2026
- Discover Artificial Intelligence
- Haifa Alkasem + 4 more
Abstract This study presents a comprehensive evaluation of machine learning (ML) methods for the early detection of depression in Arabic tweets, addressing critical gaps in mental health informatics for Arabic-speaking populations. We introduce the ArabMindGuard (AMG) dataset, which comprises 3083 expertly annotated tweets from Saudi Arabia, and perform systematic comparisons across five datasets that encompass multiple Arabic dialects, including Saudi, Jordanian, and Modern Standard Arabic (MSA). Our research employs a rigorous evaluation framework, incorporating multiple performance metrics such as precision, recall, F1-score, and statistical significance testing. Leveraging advanced Arabic-specific natural language processing (NLP) techniques–including Term Frequency-Inverse Document Frequency (TF-IDF), N-gram analysis, and CAMeL tools–we assess the performance of nine machine learning classifiers with statistical rigor. The results show that ensemble methods and Support Vector Machines (SVM) consistently outperform other approaches, achieving F1-scores ranging from 0.82 to 0.95 across the datasets. These findings highlight the significant role of dialectal variation in depression detection and provide foundational baseline metrics for future deep learning applications. Overall, this study makes a valuable contribution to culturally responsive digital mental health interventions in the Arab world and establishes methodological benchmarks for Arabic mental health text analysis.
- Research Article
- 10.3390/su18020856
- Jan 14, 2026
- Sustainability
- Yoojin Shin + 1 more
Sustainability has become a central concern globally, and efforts to enhance it are being made across various fields. In line with this trend, corporate sustainability reports have become more widely published. These reports provide both financial and non-financial information on a company’s sustainability. In this context, this study aims to, first, analyze the key keywords contained in CEO messages. Second, it examines whether the keywords emphasized by CEOs change in response to shifts in corporate risk under economic uncertainty. Finally, it identifies how the categories of words included in these messages are classified. To address these research questions, text analysis was selected as the methodology. Specifically, a qualitative research approach using text mining and CONCOR analysis was conducted on the text from sustainability report. According to the Term Frequency and Term Frequency-Inverse Document Frequency analyses, the most frequently occurring keywords were ESG, Sustainable, Society, Stakeholders, Growth, Environment, Effort, and Future. Centrality analysis identified the following keywords as having high centrality: Sustainable, ESG, Society, Environment, Growth, Effort, and Stakeholders. Finally, CONCOR analysis revealed four clusters: Eco-friendly Energy, ESG Management, Global Crisis, and Technological Competitiveness. This study is significant in that it analyzes the major keywords and their changes within unstructured text data using text mining and CONCOR analysis, and it suggests the possibility of future quantitative analysis of non-financial information using these keywords.
- Research Article
- 10.3390/su18020797
- Jan 13, 2026
- Sustainability
- Mariana Lazzaro-Salazar + 8 more
This article critically examines the conceptual boundaries and applications of the terms biocultural and ecocultural in interdisciplinary research addressing biodiversity threats in rural communities. The aim is to clarify their meanings and propose recommendations for their use in sustainability science. We conducted an integrative conceptual review combining a narrative literature analysis and corpus linguistics methods on 54 documents across four disciplinary areas: Ecology and Biodiversity Conservation, Economics and Heritage, Ecocriticism and Literature, and Sociocultural Discourses. The narrative synthesis explores theoretical interpretations, while the corpus analysis quantifies term frequency and collocations to identify patterns of use. The results reveal that biocultural perspectives emphasise species-focused interactions, traditional knowledge, rights, ecoethics, and governance, whereas ecocultural approaches foreground discourse, communication, identity, education, and long-term ecological processes. Both frameworks converge in their concern for sustainability and cultural–ecological interdependence but differ in scope and temporal depth. This study contributes scientifically by offering a situated, interdisciplinary analysis of these concepts, and socially by underscoring the need for dialogical frameworks that respect local knowledge and expand applications beyond rural contexts to urban, educational, and policy domains. Recommendations are provided to guide interdisciplinary teams in adopting context-specific conceptualizations for research and action.
- Research Article
- 10.3390/info17010076
- Jan 12, 2026
- Information
- Abrar Alsayed + 2 more
This paper introduces Saudi Dialects Cyber Violence Detection (SD-CVD) corpus, a large-scale, class-balanced Saudi-dialect corpus for fine-grained cyber violence detection on online platforms. The dataset contains 88,687 Saudi Arabic tweets annotated using a three-level hierarchical scheme that assigns each tweet to one of 11 mutually exclusive classes, covering benign sentiment (positive, neutral, negative), cyberbullying, and seven hate-speech subtypes (incitement to violence, gender, national, social class, tribal, religious, and regional discrimination). To mitigate the class imbalance common in Arabic cyber violence datasets, data augmentation was applied to achieve a near-uniform class distribution. Annotation quality was ensured through multi-stage review, yielding excellent inter-annotator agreement (Fleiss’ κ > 0.89). We evaluate three modeling paradigms: traditional machine learning with TF–IDF and n-gram features (SVM, logistic regression, random forest), deep learning models trained on fixed sentence embeddings (LSTM, RNN, MLP, CNN), and fine-tuned transformer models (AraBERTv02-Twitter, CAMeLBERT-MSA). Experimental results show that transformers perform best, with AraBERTv02-Twitter achieving the highest weighted F1-score (0.882) followed by CAMeLBERT-MSA (0.869). Among non-transformer baselines, SVM is most competitive (0.853), while CNN performs worst (0.561). Overall, SD-CVD provides a high-quality benchmark and strong baselines to support future research on robust and interpretable Arabic cyber-violence detection.
- Research Article
- 10.36341/rabit.v11i1.6924
- Jan 11, 2026
- Rabit : Jurnal Teknologi dan Sistem Informasi Univrab
- Al Ikhsan Faiq + 3 more
Sentiment analysis of e-commerce app reviews is essential to capture user perception and guide service improvements. However, review datasets are typically imbalanced—especially for the neutral class—making accuracy-only evaluation inadequate. This study proposes a hybrid approach that combines IndoBERT fine-tuning with a TF–IDF + logistic regression ensemble, augmented with probability calibration via temperature scaling, a dedicated neutral threshold rule, and a rating-based prior for low-confidence predictions. To avoid data leakage, the dataset is first split using stratified sampling into 72% training, 8% validation, and 20% testing; oversampling is applied only on the training split. Training uses label smoothing and early stopping (patience=2). The best validation configuration achieves macro-F1 of 0.8158 (T=0.941; α=0.70; t_neu=0.55; γ=0.10; τ=0.60). On the test set, the proposed model reaches 86.77% accuracy, 81.71% macro-F1, and 86.76% weighted-F1. An ablation study shows consistent gains from the TF–IDF+LR baseline to the full hybrid model, with the most notable improvement in the neutral class.
- Research Article
- 10.31937/ti.v17i2.4034
- Jan 8, 2026
- Ultimatics : Jurnal Teknik Informatika
- Jonathan David + 3 more
Student satisfaction with university facilities and services requires in-depth analysis to ensure improvements in unsatisfactory facilities or services while maintaining those that meet expectations. This study aims to analyze sentiment in student satisfaction surveys using Natural Language Processing (NLP) methods. Survey data collected from 2022 to 2024 were analyzed using two main approaches: Naive Bayes (NB) with n-grams (n=1,2,3) employing feature extraction methods such as Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW), and Bidirectional Encoder Representations from Transformers (BERT). The analysis results indicate that BERT outperforms NB in terms of sentiment prediction accuracy, although the difference is not highly significant. This study also identified keywords for both positive and negative sentiments. These keywords were then analyzed across 11 categories of facilities and services to provide focused insights into aspects that need to be maintained or improved. This study concludes that sentiment analysis provides significant contributions to universities in evaluating and enhancing the quality of facilities and services according to student preferences.
- Research Article
- 10.3390/healthcare14020140
- Jan 6, 2026
- Healthcare
- Siyun Kim + 7 more
Background/Objectives: Caregivers of infants with congenital muscular torticollis (CMT) frequently seek information online, although the accuracy, clarity, and safety of web-based content remain variable. As large language models (LLMs) are increasingly used as health information tools, their reliability for caregiver education requires systematic evaluation. This study aimed to assess the reproducibility and quality of ChatGPT-5.1 responses to caregiver-centered questions regarding CMT. Methods: A set of 17 questions was developed through a Delphi process involving clinicians and caregivers to ensure relevance and comprehensiveness. ChatGPT generated responses in two independent sessions. Reproducibility was assessed using TF–IDF cosine similarity and embedding-based semantic similarity. Ten clinical experts evaluated each response for accuracy, readability, safety, and overall quality using a 4-point Likert scale. Results: ChatGPT demonstrated moderate lexical consistency (mean TF–IDF similarity 0.75) and high semantic stability (mean embedding similarity 0.92). Expert ratings indicated moderate to good performance across domains, with mean scores of 3.0 for accuracy, 3.6 for readability, 3.1 for safety, and 3.1 for overall quality. However, several responses exhibited deficiencies, particularly due to omission of key cautions, oversimplification, or insufficient clinical detail. Conclusions: While ChatGPT provides fluent and generally accurate information about CMT, the observed variability across topics underscores the importance of human oversight and content refinement prior to integration into caregiver-facing educational materials.
- Research Article
- 10.52435/complete.v6i2.741
- Jan 4, 2026
- Journal of Computer Electronic and Telecommunication
- Alam Rahmatullah + 1 more
The free nutritious meal policy has become a hot topic of discussion among the public because it is related to improving health and education quality. However, its implementation has given rise to a variety of pros and cons that need to be analyzed systematically. This study aims to analyze sentiment toward the policy by utilizing Term Frequency–Inverse Document Frequency (TF-IDF) and Word2Vec as feature extraction methods on public review data obtained from social media X. After undergoing preprocessing and automatic labeling, the data was classified into positive and negative sentiments using the Support Vector Machine (SVM) algorithm. The analysis results show that the sentiment data is unbalanced, with the positive class dominating at 75% and the negative class at 25%. In model testing, TF-IDF achieved an accuracy of 81%, while Word2Vec achieved an accuracy of 80%. This difference shows that TF-IDF is more stable in handling short and informal texts, while Word2Vec still has the potential to capture the semantic context between words. This research opens up opportunities for further research, it is recommended to balance the data between classes and combine the TF-IDF and Word2Vec methods, or use a deep learning approach such as BERT to obtain more accurate results and capture deeper semantic context.
- Research Article
- 10.33395/sinkron.v10i1.15508
- Jan 4, 2026
- sinkron
- I Gusti Made Ngurah Ari Bhawanaputra + 4 more
Lontar Usada Rare is a traditional Balinese manuscript containing pediatric medical knowledge based on local wisdom, yet its narrative format limits accessibility and utilization in modern contexts, while its physical fragility threatens long-term preservation. This study aims to develop a pediatric disease classification model using a Support Vector Machine (SVM) combined with Term Frequency–Inverse Document Frequency (TF-IDF) weighting to support the digitalization of Balinese traditional medicine. A total of 422 data samples were collected through expert interviews and manuscript analysis, covering symptoms, disease types, herbal ingredients, and treatment procedures. The research stages included text preprocessing (cleansing, tokenizing, stopword removal, stemming), manual labeling into 35 disease classes, and model evaluation using five train–test split ratios (80:20 to 60:40) with variations of the complexity parameter C (0.5, 1, 10, 100, 1000). The best performance was achieved using C=10 with an 80:20 ratio, resulting in 87.06% accuracy, 91.55% precision, 87.06% recall, and an F1-score of 87.96%. Confusion matrix analysis showed strong classification performance for most classes, although minority classes with overlapping symptoms exhibited misclassification. Overall, the TF-IDF and linear SVM combination effectively classifies pediatric disease symptoms from Lontar Usada Rare and contributes to the preservation and digital transformation of Balinese traditional medical knowledge for potential modern healthcare applications.
- Research Article
- 10.59896/aqlu.v4i1.494
- Jan 3, 2026
- Al-Aqlu: Jurnal Matematika, Teknik dan Sains
- Alvin Febrian + 1 more
The development of digital streaming platforms has led to information overload, making it difficult for users to choose a movie. This study aims to design and implement a movie recommendation system to address this issue. The method used is Content-Based Filtering (CBF), which focuses on textual content analysis. This system uses the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to weight words in movie synopses, and Cosine Similarity to calculate similarities between movies. The results of the study indicate that the system was successfully implemented. Functional tests showed that the system was able to provide highly relevant recommendations when the synopsis keywords were unique, such as in the 'Batman' test. However, the system also showed limitations when the keywords were ambiguous, such as in the 'Hulk' test which incorrectly matched the name "Bruce". For quantitative accuracy evaluation, the system was tested using the Precision@k metric and achieved an average precision value of 30.00% at P@5. The conclusion of this study is that the synopsis-based CBF method was successfully implemented, but its performance was shown to be highly dependent on the quality and uniqueness of the keywords in the synopsis data
- Research Article
- 10.47467/reslaj.v8i1.10370
- Jan 2, 2026
- Reslaj: Religion Education Social Laa Roiba Journal
- Grace Trifosa Sagala + 1 more
The rapid development of digital financial technology in Indonesia has led to increased use of mobile banking applications as the primary medium for customer transactions. Two widely used platforms are BCA Mobile and BRImo, both of which receive various user reviews on the Google Play Store. This study aims to analyze user sentiment toward these applications using the Multinomial Naïve Bayes method and TF–IDF feature weighting. The dataset consists of 18,034 reviews that have undergone text preprocessing, including cleansing, case folding, tokenizing, normalization, filtering, and stemming, as well as lexicon-based sentiment labeling into two classes: positive and negative. The evaluation results show that the developed model achieves an accuracy of 96% for BRImo and 91% for BCA Mobile, demonstrating strong performance in identifying and distinguishing positive and negative reviews. Overall, reviews for BRImo are predominantly positive, while BCA Mobile also shows a positive tendency but receives a higher number of complaints. These findings are expected to provide useful insights for mobile banking developers in improving service quality and enhancing user experience.
- Research Article
- 10.31294/co-science.v6i1.10445
- Jan 1, 2026
- Computer Science (CO-SCIENCE)
- Musriatun Napiah + 3 more
Makanan Bergizi Gratis (MBG) program is a strategic initiative of the Indonesian government to improve the nutritional quality of schoolchildren. This research seeks to examine public sentiment regarding the MBG program by leveraging 10,000 tweets obtained from Kaggle. The method used combines Natural Language Processing (NLP) and Machine Learning approaches, several algorithms such as Logistic Regression, Support Vector Machine (SVM), Random Forest, Naive Bayes, XGBoost, and LightGBM were tested to compare classification performance. The dataset contains a collection of public reviews categorized into three sentiment classes: positive, negative, and neutral. The analysis process includes text cleaning, tokenization, stopword removal, and stemming to obtain a cleaner text representation. Text features were then extracted using the Term Frequency–Inverse Document Frequency (TF-IDF) method. The results showed that the Logistic Regression 97% with an F1-score of 0.9552 models showed the most optimal performance. Sentiment analysis revealed 65% positive responses, 25% neutral, and 10% negative, with the dominant keywords being “nutrisi,” “sehat,” “anak sekolah,” and “gratis.” The results visualization, in the form of a Word Cloud and a bar chart, indicate that public opinion tends to be positive towards the implementation of the MBG program, particularly regarding improving the nutrition of schoolchildren. This research is expected to provide input for policymakers in evaluating public perceptions of the implementation of food-based social programs.
- Research Article
- 10.1177/21582440261415716
- Jan 1, 2026
- Sage Open
- Yiyang Hu + 1 more
Corpus-based translation studies (CTS) has been shaped by theoretical developments within translation studies, while also incorporating insights from related fields such as linguistics and digital humanities. Although prior reviews have noted the field’s increasing methodological diversity, this complexity has made it challenging for new researchers seeking to understand the field’s trends and developments. Bibliometric analysis offers a systematic, data-driven approach to exploring scholarly activity and provides a valuable means to address this gap. However, its application in CTS has remained limited in recent years. This study addresses the gap by conducting a bibliometric analysis of 477 research articles published between 2015 and 2024, retrieved from the Web of Science (WoS) database. By examining publication trends, term frequencies, and citation patterns, it identifies shifts in key research themes and the underlying intellectual structures of CTS. The findings reveal that CTS has expanded and refined its research focus over the past decade, with a predominant emphasis on translation universals and a noticeable rise in interpreting studies. Network analyses further elucidate the intricate relationships among these themes, providing a deeper understanding of the field’s evolving dynamics.
- Research Article
- 10.3389/fimmu.2026.1745842
- Jan 1, 2026
- Frontiers in immunology
- Xuan Tang + 4 more
In recent years, researchers have identified numerous potential biomarkers and therapeutic targets applicable to cancer immunotherapy, among which the role of tumor-internal microorganisms in the tumor microenvironment has been explored. However, this field is still in its early stages of development, facing limitations such as the unclear mechanisms of interaction between tumor-internal microorganisms and host immunity, as well as significant variations in microbial profiles among different tumor types and patients. This study aims to explore the research hotspots and development trends of tumor-internal microorganisms through bibliometric methods and to construct a systematic knowledge map. This study retrieved publications related to tumor-internal microorganisms from the Web of Science Core Collection (WOSCC) prior to December 22, 2025. Subsequently, the selected literature was analyzed using VOSviewer (v.1.6.20), CiteSpace (v.6.4.1R), and SCImago Graphica. In addition, we integrated PubMed data to assess status and trends in preclinical and clinical studies of intratumoral microbiota interventions for anti-tumor therapy efficacy. From the Web of Science database, we retrieved 1,278 relevant articles. Since 2012, the number of papers published on the intratumoral microbiota has shown an overall upward trend. China and the United States are the two major countries in this field. Keyword analysis shows that "tumor microbiome," "gut microbiome," "cancer," and "Fusobacterium nucleatum" are frequent terms. 11 keyword groups have been identified, among which "tumor immunotherapy" and "immune microenvironment" form two important groups. A total of 69 preclinical and clinical studies has intervened in intratumoral microbiota and affected anti-tumor treatment outcomes. Among them, 25 studies involving Fusobacterium nucleatum account for a large proportion. However, most of these studies are still at the basic or preclinical stage, and clinical translation evidence is limited.
- Research Article
- 10.1155/joot/9692976
- Jan 1, 2026
- Journal of Transplantation
- Haneen Al-Abdallat + 7 more
IntroductionThe integration of artificial intelligence (AI) in liver and kidney transplantation (LKT) research has surged in recent years, promising novel approaches to address traditional statistical challenges and enhance result robustness and generalizability. This study aims to explore the extent of international collaboration and the evolution of research trends in AI applications for LKT.MethodsOn August 12, 2025, a systematic search was conducted using the Web of Science database to identify relevant literature. Bibliometric tools, including the “bibliometrix” package in R, VOSviewer, and Microsoft Excel were used. Key indicators such as country contributions, multiple‐country publications, single‐country publications, co‐authorship, and keyword co‐occurrence were examined to assess collaboration patterns and research hotspots. Inclusion criteria involved all published peer‐reviewed articles related to AI in LKT. Editorials, corrections, and irrelevant documents were excluded.ResultsA total of 633 articles published between 1994 and 2025 were included in the analysis. These collectively received 8959 citations. The United States of America emerged as the leading contributor, accounting for 37.12% of the publications, followed by China and South Korea. Notably, international co‐authorship was evident in 30.02% of the publications. Keyword analysis revealed that “survival,” “outcomes,” “risk,” “mortality,” and “prediction” were the most frequent terms, highlighting them as hotspots in transplantation research.ConclusionThe field of AI in LKT research is characterized by a growing international collaboration, despite the fact that participation is still uneven and concentrated in high‐income countries. In order to advance the field and enhance outcomes across diverse patient populations, it will be crucial to strengthen global data‐sharing and cultivate equity‐focused, culturally adaptable AI models.