Articles published on Text Mining Algorithms
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
370 Search results
Sort by Recency
- Research Article
- 10.1016/j.compbiomed.2026.111599
- Apr 1, 2026
- Computers in biology and medicine
- Clodomir Santana + 19 more
The validation of promising clinical biomarkers, molecular mechanisms, and novel drug targets in cardiovascular disease (CVD) is hindered by a vast and fragmented biomedical literature, which now exceeds 38 million publications indexed in PubMed. To address the central challenge of navigating and synthesizing a huge fragmented biomedical literature base, we applied our validated machine learning-based text-mining algorithm containing natural language processing (NLP) and incorporated this into a ValIdated Text-mining using Advanced Language model (VITAL) as a complementary framework. Using this approach, we analyzed more than 38 million PubMed abstracts and identified over 5.5 million relevant to six major CVD groups. These curated data then enabled a deep-dive case study on heart failure with preserved ejection fraction (HFpEF). Our computational framework systematically queried, quantified, mapped, and prioritized protein-disease associations, confirming established CVD biomarkers, such as BNP, troponin-I, galectin-3, and renin, and revealing novel protein signatures with potential diagnostic and therapeutic relevance. Ischemic heart disease (IHD, heart attacks), cardiomyopathy (CM, leading to heart failure), and cerebrovascular accidents (CVA, strokes and brain hemorrhages) exhibited the highest protein attribution densities and overlap, suggesting shared molecular pathways. Using HFpEF as a focused case study, our framework identified 5124 proteins associated with this condition, 4879 of which were shared across its major comorbidities (aging, type 2 diabetes/obesity, hypertension, and hyperlipidemia). Additionally, 4991 proteins were co-shared across key pathological mechanisms, including inflammation, mitochondrial dysfunction, and fibrosis, implicating convergent biological networks spanning these domains. To further characterize and prioritize these molecular associations, we performed a series of data science-driven analyses involving HFpEF-associated proteins. The top computationally ranked HFpEF protein candidates were the same top ranked proteins in the comorbidity-domains and in the pathology-domains suggesting that these proteins are important drivers with convergent molecular networks underlying HFpEF. Cross-referencing and validating top-ranked computational HFpEF protein candidates with clinical myocardial and extracardiac biopsy data from HFpEF patients and corresponding controls revealed that most of these proteins are predominantly expressed in the liver, pancreas, adipose tissue, and lymph nodes, rather than in cardiac tissue. This finding supports the emerging concept that HFpEF is fundamentally a multisystemic disorder mediated by inter-organ signaling rather than a disease confined to the heart. Our computational study demonstrates the capacity of text mining to annotate, integrate, and prioritize protein-disease relationships from large-scale textual data, thereby providing a complementary framework to traditional omics approaches for biomarker discovery and drug target identification in CVDs.
- Research Article
- 10.1371/journal.pdig.0001230.r003
- Feb 10, 2026
- PLOS Digital Health
- Britt W M Van De Burgt + 13 more
Adverse Drug Reactions (ADRs) pose a significant challenge in healthcare. While structured documentation of ADRs in electronic health records (EHRs) enables automated alerting, many ADRs are recorded as unstructured free-text, limiting detection. Text mining (TM) shows potential for extracting clinically relevant data from unstructured text. However, the portability of TM algorithms across different institutions and departments remains uncertain, due to variations in EHR structures and documentation practices. To enhance these general-purpose algorithms, evaluating their portability is essential for ensuring effective performance across diverse clinical settings. To evaluate the portability of a previously developed TM-based ADR identification algorithm by assessing its performance using EHRs from two different departments in two different hospitals. EHR free-text data from 62 hospitalized patients in the geriatric and orthopedic departments of two Dutch teaching hospitals were reviewed for ADRs via manual review and the TM algorithm. Performance was evaluated using F-score, sensitivity and positive predictive value (PPV), with comparisons across hospitals and departments. Manual review identified 359 unique ADRs. The TM algorithm detected 534 potential ADRs (pADRs), 286 of which overlapped with manual review, yielding an F-score of 0.64, sensitivity of 80% and PPV of 54%. Performance was consistent across hospitals and departments. Notably, 26 pADRs identified by the algorithm were clinically relevant yet missed in manual review. This study demonstrates portability of the TM algorithm by identifying pADRs across different hospitals and departments without adaptations. These findings support its broader implementation potential for ADR detection in diverse healthcare settings.
- Research Article
- 10.1371/journal.pdig.0001230
- Feb 1, 2026
- PLOS digital health
- Britt W M Van De Burgt + 12 more
Adverse Drug Reactions (ADRs) pose a significant challenge in healthcare. While structured documentation of ADRs in electronic health records (EHRs) enables automated alerting, many ADRs are recorded as unstructured free-text, limiting detection. Text mining (TM) shows potential for extracting clinically relevant data from unstructured text. However, the portability of TM algorithms across different institutions and departments remains uncertain, due to variations in EHR structures and documentation practices. To enhance these general-purpose algorithms, evaluating their portability is essential for ensuring effective performance across diverse clinical settings. To evaluate the portability of a previously developed TM-based ADR identification algorithm by assessing its performance using EHRs from two different departments in two different hospitals. EHR free-text data from 62 hospitalized patients in the geriatric and orthopedic departments of two Dutch teaching hospitals were reviewed for ADRs via manual review and the TM algorithm. Performance was evaluated using F-score, sensitivity and positive predictive value (PPV), with comparisons across hospitals and departments. Manual review identified 359 unique ADRs. The TM algorithm detected 534 potential ADRs (pADRs), 286 of which overlapped with manual review, yielding an F-score of 0.64, sensitivity of 80% and PPV of 54%. Performance was consistent across hospitals and departments. Notably, 26 pADRs identified by the algorithm were clinically relevant yet missed in manual review. This study demonstrates portability of the TM algorithm by identifying pADRs across different hospitals and departments without adaptations. These findings support its broader implementation potential for ADR detection in diverse healthcare settings.
- Research Article
- 10.55643/fcaptp.6.65.2025.5082
- Dec 31, 2025
- Financial and credit activity problems of theory and practice
- Anton Boiko + 4 more
The article is devoted to researching opportunities for improving the financial transparency and effectiveness of international technical assistance (ITA) in Ukraine during martial law conditions by improving the Prozorro public procurement system tools. The authors emphasise that during a full-scale war, the role of ITA increases significantly, as it provides funding for critical infrastructure, humanitarian needs, and institutional development. At the same time, corruption risks and restrictions on procurement transparency significantly affect the trust of international partners. The article proves the necessity of introducing automated digital financial control solutions to improve the analytical capabilities of Prozorro and strengthen the public procurement results audit system. The use of two text mining algorithms is proposed – Latent Dirichlet Allocation (LDA) and BERTopic – for analysing large amounts of tender documentation, identifying structural patterns, and identifying anomalous topics which may indicate atypical financial expenses of recipients, abnormal pricing, and misuse of funds. Practical testing on Prozorro data shows that LDA forms generalized thematic clusters that reflect the areas of expenditure of international technical assistance funds, while BERTopic allows for detailed subtopics, to detect atypical text patterns, and to identify purchases with increased financial risks. The results of the study prove that integrating the proposed models into the Prozorro system can significantly strengthen anti-corruption and financial control, reduce potential losses of budget and donor resources, optimize the system of using international technical assistance, and contribute to building a higher level of trust between Ukraine and its international partners in the process of military and post-war reconstruction.
- Research Article
- 10.1007/s11764-025-01935-w
- Dec 3, 2025
- Journal of cancer survivorship : research and practice
- Michelle A Mollica + 3 more
The National Cancer Institute(NCI) identified survivorship research for people living with advanced and metastatic cancers as a priority in 2020. The goal of this portfolio analysis was to review all National Institutes of Health (NIH) grants newly funded in Fiscal Year(FY) 2021to FY 2024 focused on this area. Grants were identified using a text mining algorithm of survivorship-relevant terms from the NIH Research, Condition, and Disease Categorization (RCDC) system and double coded for grant characteristics (e.g., study design, cancer type, primary focus, and primary outcomes). Included in this analysis were 63 grants, funded by 7 NIH institutes. The number of newly awarded grants funded rose from 9 in 2021 to 25 in 2024. The majority of grants were R01s (62.9%). Cancer types most often included were breast (22.2%), prostate (15.9%), and lung (15.9%). The primary focus of studies was most often acute toxicities (33.3%) or late- or long-term effects (33.3%). No grants focused on the financial impact of anadvanced or metastatic cancerdiagnosis. Results indicate a substantial growth in the number of grants funded from FY2021 to FY2024. Grants address some of the gap areas identified in previous NCI efforts, including longitudinal studies of symptoms. Scientific gap areas that persist include financial hardship and employment, models of care delivery, and prognostic uncertainty. Despite the growthin the grant portfolio, more research on survivors living with advanced and metastatic cancers is needed. This emerging population has unmet and understudied needs that will only expand given the growth in new therapies and targeted treatments.
- Research Article
- 10.1007/s11060-025-05257-w
- Oct 15, 2025
- Journal of neuro-oncology
- Alessandra Andreotti + 19 more
Central nervous system (CNS) tumors represent a heterogeneous group of neoplasms with significant clinical impact and variable prognosis. Despite the relatively low incidence, they account for considerable morbidity and mortality. In Italy, population-based data on incidence and survival by histological subtype and tumor grade remain limited, particularly for rarer CNS tumor entities. We conducted a retrospective population-based study using data from the Veneto Cancer Registry, including adults diagnosed with CNS tumors between 2016 and 2020. A dedicated text-mining algorithm was applied to pathology reports to extract tumor grade. Tumors were categorized into six main histological groups. We estimated incidence rates, relative survival, and 5-year conditional relative survival, stratified by sex, age, tumor type, and grade. A total of 1,636 incident CNS tumors with confirmed histopathology and intermediate to high-grade behavior were identified. Glioblastoma was the most frequent subtype (64.6%), followed by grade 2-3 meningiomas (18.2%) and astrocytomas (9.4%). The overall crude incidence was 8.0 per 100,000, higher in males (9.5) than females (6.6). Five-year relative survival varied substantially by tumor type and grade: glioblastoma had the poorest outcome (5.7%), while grade 2-3 ependymomas and oligodendrogliomas showed favorable prognosis (87.7% and 82.0%, respectively). Conditional 5-year survival after surviving one year remained low for glioblastoma (11.0%) but exceeded 85% for most lower-grade tumors. Our findings underscore the prognostic relevance of tumor grade and histology, supporting the need for tailored clinical strategies, molecular diagnostics, and the development of innovative therapies for informed healthcare planning and resource allocation.
- Research Article
- 10.31289/jppuma.v13i2.15700
- Oct 6, 2025
- JPPUMA Jurnal Ilmu Pemerintahan dan Sosial Politik Universitas Medan Area
- Yermia Hendarwoto
This study evaluates Indonesian public perceptions of the application of Pancasila Legal Philosophy in international diplomacy through sentiment analysis on Twitter. Using text mining and machine learning algorithms—Naïve Bayes, Support Vector Machine (SVM), and Random Forest—1,000 tweets containing keywords such as “Pancasila diplomacy,” “Indonesia at the UN,” and “Indonesian foreign policy” were collected and classified into positive, negative, and neutral categories. The distribution of sentiment indicates that 60% of tweets express positive perceptions, highlighting pride in Indonesia’s promotion of Pancasila values in global forums, 25% remain neutral, and 15% are negative, reflecting criticism of perceived inconsistencies between Pancasila and diplomatic practice. Model evaluation employed a confusion matrix and metrics of accuracy, precision, and recall across sentiment classes. Results demonstrate that Random Forest outperformed other models with 91% accuracy, stable precision, and recall across all classes. By comparison, SVM achieved 89% accuracy with consistent performance in high-dimensional text data, while Naïve Bayes recorded 85% accuracy but was less effective in handling class imbalance, particularly in neutral–negative distinctions. The Random Forest model explained the greatest variance in sentiment classification, confirming its strength in processing short and contextually complex texts such as tweets. Practically, these findings provide a foundation for developing a real-time sentiment monitoring system to support adaptive and participatory diplomacy. Integrating sentiment analysis into policy design enables the Ministry of Foreign Affairs to anticipate public responses, strengthen diplomacy narratives rooted in Pancasila values, and build a data-driven ecosystem for public diplomacy. This contributes to inclusive, ethical, and responsive foreign policy aligned with Indonesia’s state philosophy.
- Research Article
2
- 10.1016/j.ophtha.2025.04.026
- Oct 1, 2025
- Ophthalmology
- Cecilia S Lee + 12 more
A Data-driven Age-related Macular Degeneration Severity Scoring System Leveraging the AREDS Studies and Clinical Electronic Medical Records.
- Research Article
- 10.35377/saucis...1626239
- Sep 30, 2025
- Sakarya University Journal of Computer and Information Sciences
- Hüseyin Ataseven + 1 more
This study compares the classification accuracy of text mining algorithms for foreign language proficiency exam items. The dataset included 2,868 items from ÜDS English tests (2006–2012) across Natural and Applied Sciences (n=956), Health Sciences (n=956), and Social Sciences (n=956). Algorithms tested were k-Nearest Neighbors (kNN), Naïve Bayes (NB), Naïve Bayes-Kernel (NB-K), Random Forest (RF), and Support Vector Machines (SVM). Binary classification accuracies ranged from 83.08% (NB) to 92.48% (SVM), while multiclass accuracies ranged from 71.93% (NB) to 84.96% (kNN). Expert analysis and cross-validation identified class-inconsistent items that negatively affected accuracy. Removing these items improved binary classification by 7.39%–9.83% and multiclass classification by 10.58%–17.89%. Among algorithms, kNN was least impacted by class-inconsistent data. These findings highlight the importance of addressing inconsistencies for improving algorithmic performance, with kNN showing robust results across scenarios.
- Research Article
- 10.54254/2755-2721/2025.gl27106
- Sep 24, 2025
- Applied and Computational Engineering
- Yiyi Cai
Environmental, Social, and Governance (ESG) investing has gained unprecedented momentum in global financial markets, driving the need for sophisticated analytical frameworks that can process vast amounts of unstructured information. This research presents a comprehensive investigation into the application of natural language processing techniques for ESG news sentiment analysis and its subsequent impact on investment portfolio performance. The study develops a multi-dimensional sentiment analysis model that extracts ESG-related information from financial news sources, incorporating advanced text mining algorithms to quantify sentiment scores across environmental, social, and governance dimensions. Through empirical analysis of portfolio performance metrics, the research demonstrates that ESG sentiment-driven investment strategies yield superior risk-adjusted returns compared to traditional approaches. The methodology integrates real-time news processing capabilities with portfolio optimization algorithms, enabling dynamic allocation decisions based on sentiment-derived ESG signals. Experimental results indicate a 50.8% improvement in Sharpe ratio and 17.3% reduction in portfolio volatility when incorporating ESG sentiment analysis. The findings contribute to the advancement of sustainable finance technology and provide practical insights for institutional investors seeking to enhance portfolio performance through alternative data integration.
- Research Article
1
- 10.1371/journal.pmed.1004721
- Sep 16, 2025
- PLOS Medicine
- Emma N Cleary + 9 more
BackgroundThe extent to which the documented association between prenatal prescribed opioid analgesic (POA) exposure and neurodevelopmental disorders in children is causal or due to confounding is unknown. The objective of this study was to evaluate associations between dose and duration of POA exposure during pregnancy and autism spectrum disorder (ASD) or attention-deficit/hyperactivity disorder (ADHD) in children while minimizing bias due to confounding and other sources.Methods and findingsThis retrospective study analyzed a population-based cohort of births using national register data from Sweden. The ASD analysis cohort consisted of 1,267,978 children born in Sweden from July 1st, 2007 to December 31st, 2018, with follow-up through 2021. A shorter eligibility period was used to study ADHD given its later age of typical diagnosis, consisting of 918,771 children born through December 31st, 2015. Text-mining algorithms were used to derive cumulative dose and duration of POA exposure during pregnancy from filled POA prescriptions, as well as to identify prescriptions that were to be taken on an “as needed” basis. Outcomes were identified through inpatient or outpatient clinical diagnosis of ASD and ADHD or dispensed ADHD medications. Cox proportional hazards regression models were adjusted for measured covariates from multiple domains. Several designs were used to help address unmeasured confounding: comparisons with children whose birthing parent had a diagnosed painful condition but did not receive POAs, children whose birthing parent received POAs in the year before but not during pregnancy, and siblings who were not exposed to POAs. Of the 1,267,978 children, 48.6% were female and 4.4% were exposed to POAs during pregnancy. At age 10, cumulative incidence of ASD was 2.0% among children unexposed to POAs, 2.9% among children exposed to a low dose across pregnancy, and 3.6% among children exposed to a high dose. In unadjusted models (e.g., hazard ratio [HR]high, 1.74, 95% confidence interval [CI], 1.63, 1.87) and when accounting for measured covariates, cumulative maximum dose was associated with increased risk of ASD (e.g., HRhigh, 1.34, 95% CI, 1.24, 1.44). However, the associations were largely or fully attenuated when using alternative designs (particularly when comparing to children whose birthing parent received POAs before but not during pregnancy: HRhigh, 1.10, 95% CI, 1.00, 1.21). No associations were observed in the sibling comparison (HRhigh, 0.99, 95% CI, 0.81, 1.21). This overall pattern of associations was also observed when considering duration of exposure, and in numerous sensitivity analyses, as well as for analyses of ADHD. A main limitation of this study was that the distribution of dose and duration of POAs prescribed to birthing parents in Sweden limited our ability to explore the effects of extremely high dose and duration on risk for neurodevelopmental disorders.ConclusionsWhile increased risks with high amounts of POA exposure cannot be ruled out, the results suggest that confounding may largely explain the increased risks of ASD and ADHD associated with prenatal POA exposure at the levels observed in this cohort.
- Research Article
- 10.29140/tltl.v7n2.102841
- Sep 15, 2025
- Technology in Language Teaching & Learning
- Kadir Karakaya + 4 more
The increasing integration of generative Artificial Intelligence (AI) tools, such as ChatGPT, in education has prompted growing interest in their pedagogical potential and the emergent competencies required for their effective use in language instruction. While generative AI is beginning to influence language teaching and learning practices, emerging research suggests a growing need to address AI-related literacies and ethical considerations within language teacher education programs. Despite the growing number of studies examining generative AI’s use in language learning contexts, there remains a notable gap in systematically reviewing how generative AI is being addressed in teacher preparation and professional development. To address this gap, this study presents a bibliometric-based systematic literature review of research on generative AI in language teacher education, employing text-mining algorithms, data-mining heuristics, and social network analysis. The findings identify five major thematic clusters in the literature: (1) Professional Development and AI Literacy in Teacher Education, (2) Chatbots and Conversational AI in Language Learning, (3) Generative AI for Instructional Design, Assessment, and Lesson Planning, (4) Generative AI as a Tool for Enhancing EFL Writing Skills, and (5) Exploring Pre-Service Teachers’ Perceptions and Readiness. This review contributes to the growing discourse on AI in education by mapping the current research landscape and identifying critical directions for advancing generative AI integration in language teacher education.
- Research Article
1
- 10.3390/jcm14165615
- Aug 8, 2025
- Journal of Clinical Medicine
- Suryeon Ryu + 3 more
Background/Objectives: Physical activity (PA) is widely recognized as a beneficial approach to improving the health-related quality of life (HRQoL) of breast cancer survivors. This study explored key research topics and emerging trends in studies related to PA and HRQoL among breast cancer survivors. Methods: Titles and abstracts of 3847 English-language research articles (2000–2024) were retrieved from PubMed, EMBASE, Web of Science, and Scopus using keywords related to ‘breast cancer’, ‘PA/exercise’, and ‘HRQoL’. A text-mining algorithm based on the Dirichlet-multinomial regression approach in Python was applied to identify the top 10 research topics and their trends over time. Results: In total, 10 key topics emerged: (1) Quality of Life and Well-being, (2) Cancer Treatment and Health-Related Fitness, (3) Supportive Care and Psychosocial Factors, (4) Survivorship, Palliative Care, and Integrative Medicine, (5) Physical Activity and Sedentary Behaviors, (6) Upper Limb-Related Side Effects, (7) Cancer-Related Fatigue and Symptoms, (8) Epidemiological and Clinical Factors, (9) Side Effects of Cancer Treatment, and (10) Weight Management. Among these, Topics 1, 2, 3, 8, and 9 followed upward trajectories, while others showed relatively stable trends. Conclusions: Findings highlight that PA research on breast cancer survivors’ HRQoL spans all stages of survivorship and considers both clinical outcomes and psychosocial and emotional well-being. Understanding how PA and HRQoL have been represented in research helps clarify which survivor needs have received attention and which remain underexplored. These thematic patterns underscore growing acknowledgement of survivors’ lived experiences and offer a roadmap for addressing future research and care gaps.
- Research Article
- 10.20414/light.v5i1.11381
- Jul 31, 2025
- THE LIGHT : Journal of Librarianship and Information Science
- Juwita Kusumaningtyas
ABSTRACT This study explores the application of text mining algorithms to analyze book reviews on social media with the goal of improving library services. Analyzing 5,000 book reviews revealed that the majority are positive, particularly for genres like science fiction and fantasy. Text classification identified key themes in reviews, such as character development, plot, and writing quality, with character development being the most prominent. Entity extraction highlighted frequently mentioned authors and books, indicating reader interest in specific works. These findings suggest that the application of text mining algorithms can assist libraries in updating their collections, aligning with reader preferences, and designing more relevant programs. This data-driven approach contributes to enhancing library service efficiency and advancing information technology in collection management. Keywords: text mining, book reviews, sentiment analysis, libraries, collection development ABSTRAK Penelitian ini mengeksplorasi penerapan algoritma text mining untuk menganalisis ulasan buku di media sosial dengan tujuan meningkatkan layanan perpustakaan. Melalui analisis terhadap 5.000 ulasan buku, ditemukan bahwa mayoritas ulasan bersifat positif, terutama untuk genre fiksi ilmiah dan fantasi. Klasifikasi teks mengidentifikasi tema utama dalam ulasan, yaitu pengembangan karakter, alur cerita, dan kualitas penulisan, dengan pengembangan karakter sebagai tema yang paling dominan. Ekstraksi entitas mengungkapkan penulis dan buku yang sering disebutkan, menunjukkan minat pembaca terhadap karya-karya tertentu. Temuan ini menunjukkan bahwa penerapan algoritma text mining dapat membantu perpustakaan dalam memperbarui koleksi, menyesuaikan dengan preferensi pembaca, dan merancang program yang lebih relevan. Pendekatan berbasis data ini berkontribusi pada peningkatan efisiensi layanan perpustakaan dan kemajuan teknologi informasi dalam pengelolaan koleksi. Kata Kunci: text mining, ulasan buku, analisis sentimen, perpustakaan, pengembangan koleksi
- Research Article
1
- 10.3390/su17125332
- Jun 9, 2025
- Sustainability
- Hogyeong Jeong + 4 more
Rural plans incorporating regional identity are vital for fostering regional revitalization and offering viable policy alternatives. The need for a systematic approach that recognizes both the diversity and shared characteristics of rural areas has become increasingly clear. Although numerous studies have explored rural classification, research examining specific regional characteristics remains limited. Hence, this study aimed to establish a comprehensive standard for developing effective rural plans. To this end, a study was conducted to classify the characteristics of rural areas using topic modeling, which is a text-mining algorithm. An analysis of publications on rural revitalization projects in Korea over the past decade revealed five common factors of success themes across each region. The five success factors, “local cultural experience”, “environment and landscape utilization”, “community activation”, “regional infrastructure development”, and “local economic activation”, should be considered in rural areas with different characteristics when establishing rural plans and policies. This classification of success factors serves as the foundation for establishing rural plans and policies. By applying different weights to the five success factors according to the unique characteristics and conditions of each region, it would open a great number of possibilities to establish more precise and effective customized plans. Future research is required to provide more empirical and broadly applicable results based on the classification framework proposed in this study.
- Research Article
1
- 10.1371/journal.pone.0321202
- May 7, 2025
- PloS one
- Mingyue Wang + 2 more
Prior research has tended to disregard the dynamic nature of customer satisfaction in online shopping and how it influences corporate marketing decisions. This study originally introduces a dynamic online shopping customer satisfaction index model and devises a new text mining algorithm to quantify online reviews, testing and analyzing the model to reveal the intrinsic mechanism and evolutionary characteristics of online shopping customer satisfaction. Findings reveal disparities between the online shopping customer satisfaction index model and the American customer satisfaction index model. Specifically, customer expectations significantly impact customer loyalty, while customer loyalty influences complaint rates. The study also highlights the impact of COVID-19, which has intensified competition and underscored the importance of perceived quality and brand image. Our findings provides a reference for e-commerce enterprises to realize data-driven marketing decisions.
- Research Article
9
- 10.1038/s41598-025-91622-8
- Mar 4, 2025
- Scientific Reports
- Yi Jie Wang + 4 more
In the rapidly evolving field of healthcare, Artificial Intelligence (AI) is increasingly driving the promotion of the transformation of traditional healthcare and improving medical diagnostic decisions. The overall goal is to uncover emerging trends and potential future paths of AI in healthcare by applying text mining to collect scientific papers and patent information. This study, using advanced text mining and multiple deep learning algorithms, utilized the Web of Science for scientific papers (1587) and the Derwent innovations index for patents (1314) from 2018 to 2022 to study future trends of emerging AI in healthcare. A novel self-supervised text mining approach, leveraging bidirectional encoder representations from transformers (BERT), is introduced to explore AI trends in healthcare. The findings point out the market trends of the Internet of Things, data security and image processing. This study not only reveals current research hotspots and technological trends in AI for healthcare but also proposes an advanced research method. Moreover, by analysing patent data, this study provides an empirical basis for exploring the commercialisation of AI technology, indicating the potential transformation directions for future healthcare services. Early technology trend analysis relied heavily on expert judgment. This study is the first to introduce a deep learning self-supervised model to the field of AI in healthcare, effectively improving the accuracy and efficiency of the analysis. These findings provide valuable guidance for researchers, policymakers and industry professionals, enabling more informed decisions.
- Research Article
1
- 10.1016/j.esmorw.2024.100109
- Mar 1, 2025
- ESMO real world data and digital oncology
- L Mazzeo + 29 more
Data analytics for real-world data integration in TKI-treated NSCLC patients using electronic health records.
- Research Article
- 10.47065/bulletinds.v4i2.6416
- Feb 28, 2025
- Bulletin of Data Science
- Fajar Surya Atmaja
Dr. Hospital Pirngadi is a regional general hospital owned by the government and is a type B hospital located in the Medan City area, North Sumatra. Apart from that, Dr. Pirngadi is also a referral hospital for the Medan and surrounding areas. As a regional general hospital, Pirngadi Regional Hospital also plays a role in providing health services for the people of Medan city and its surroundings, services provided by customer service at Dr. Pirngadi Medan City, such as registration and information on patients who wish to register for either inpatient or outpatient care, information regarding doctor's practice schedules, facility service information, patient guarantor cooperation, bad management, and visitor information. Customer service is not yet optimal for patients and visitors, such as limited information provided, lack of accessibility and clarity of information, lack of coordination between various hospital departments. To overcome this problem, customer service can utilize artificial intelligence technology to improve customer service. This research provides a solution by building a system in the form of a chatbot, this chatbot system will become an information medium for patients and visitors. The chatbot development process uses a text mining algorithm for text processing and TF-IDF to give weight to each document available in the database. The system provides responses based on the highest level of similarity, with text mining and TF-IDF algorithms, chatbots can provide precise and accurate information on questions asked by patients and visitors. The final result of this research is a chatbot that can be used by patients and visitors to find out available information. The existence of a chatbot can make it easier for patients and visitors to get information about the services available at Dr. RSUD. Pirngadi, Medan City.
- Research Article
2
- 10.1111/jgs.19414
- Feb 21, 2025
- Journal of the American Geriatrics Society
- Lisa Gallicchio + 5 more
The purpose of this study was to describe the characteristics of the NIH-funded grant portfolio focused on cancer and accelerated aging. Research project grants focused on cancer survivors and aging trajectories that were newly funded during fiscal years 2013 through 2023 were identified by first using a text mining algorithm from the NIH Research, Condition, and Disease Categorization (RCDC) system with cancer survivorship-relevant terms and then a list of aging-related terms that included aging, neurocognition, and physical function. Included grants were double coded to extract study characteristics. A total of 166 grants were identified, with the National Cancer Institute (NCI) and National Institute on Aging (NIA) funding 62.0% and 23.5% of the grants, respectively. The number of newly funded grants rose from nine in 2013 to 27 in 2023. Overall, the majority were observational studies (65.1%); 45% included study samples of multiple cancer types. The most commonly examined outcomes were cognitive (54.4%) or physical (37.5%) functioning; 30% of grants incorporated an aging-related biomarker. Few grants focused on racial and ethnic minority (3.0%) or rural cancer survivors (2.4%). This portfolio analysis showed an increase in the number of NIH-funded grants focused on cancer survivors and accelerated aging, but notable gaps are evident. Given the rapidly growing survivor population, many of whom will experience accelerated aging trajectories, there is a critical need to better understand accelerated aging phenotypes and mechanisms, so that those at the highest risk for adverse aging-related effects can be identified and interventions developed.