Related Topics
Articles published on Text Mining
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
15207 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.jhazmat.2026.141934
- May 1, 2026
- Journal of hazardous materials
- Lu Han + 8 more
Text mining and machine learning based health risk prediction for soil polycyclic aromatic hydrocarbons at typical coal-fired industrial sites in china.
- New
- Research Article
- 10.1016/j.foodpol.2026.103055
- May 1, 2026
- Food Policy
- Peng Lu + 4 more
Public perspectives on food date labeling: implications for waste reduction and food safety policy
- New
- Research Article
- 10.1016/j.actpsy.2026.106643
- May 1, 2026
- Acta psychologica
- Erina Murata + 2 more
Identification and quantification of approval desire in social networking service posts and analysis of their linguistic features.
- New
- Research Article
- 10.1016/j.ijmedinf.2026.106321
- May 1, 2026
- International journal of medical informatics
- Majdi Jaradat + 1 more
Explainable AI in Cardiology Diagnostics: A Systematic Review of Machine Learning, Meta-heuristic Optimization, and Clinical Text Mining for Coronary Artery Disease.
- New
- Research Article
- 10.1016/j.eswa.2026.131216
- May 1, 2026
- Expert Systems with Applications
- Iwao Fujino + 2 more
• We use PQk-means to convert AIS trajectories into sequences of code documents. • We apply TF-IDF to extract code distribution-based representation of trajectories. • We compute cosine similarity to find similar trajectories and vessels. • We apply K-means to cluster voyages and vessels from the proposed representation. • We use SVM to recognize vessels based on the proposed representation. Automatic Identification System (AIS) data received from vessels in a maritime area of interest is a valuable resource for understanding vessel behavior and gaining insights into maritime activities. This paper presents a novel approach for representing vessel trajectories using code distribution and analyzing AIS trajectory big data through machine learning techniques. By introducing PQk-means vector quantization algorithms, AIS trajectory data records are transformed into a series of code documents. Applying the TF-IDF (Term Frequency-Inverse Document Frequency) technique from text mining to these code documents produces a code distribution-based representation of vessel trajectories. This preliminary process enables the application of machine learning algorithms to AIS trajectory big data. Using this representation, three types of applications have been developed: detecting similar trajectories and vessels using vector space models and cosine similarity, clustering voyages and vessels with the K-means algorithm, and recognizing vessels with support vector machine algorithms. The potential of the proposed approach is demonstrated through a series of experiments using practical AIS datasets from a region in northwest France. Overall, the experimental results show that the proposed approach is highly effective for mining AIS big data, outperforms other methods, and confirms its ability to handle high-dimensional trajectories and massive amounts of AIS data within a reasonable computational cost. Moreover, this work provides an opportunity to develop an AIS-oriented version of a large language model based on our code distribution representation of trajectories, and to extend trajectory representation to any type of moving object or numerical vector from diverse sensors.
- New
- Research Article
- 10.1016/j.eneco.2026.109290
- May 1, 2026
- Energy Economics
- Xiaolei Wang + 3 more
How does inland Free Trade Zones in China trigger industrial green transformation? Evidence from policy text mining
- New
- Research Article
- 10.1080/17467586.2026.2662888
- Apr 24, 2026
- Dynamics of Asymmetric Conflict
- Anton Oleinik
ABSTRACT The article examines estimates of enemy military deaths publicly released by combatants as political and media constructs. It shows that estimates of Ukrainian military casualties produced by the Russian Federation’s Ministry of Defense and assessments of Russian military casualties made by Ukraine’s Ministry of Defense were associated more strongly with political and media variables, such as the frequency of mentions of peace in the political leaders’ war-related speeches and Internet users’ interest in specific topics, than with the independently confirmed tallies of Russian soldiers’ deaths. Official estimates of casualties were only selectively associated with the political leaders’ preparedness to continue fighting, as measured using McClelland’s human motivation theory. The association existed in the case of the U.S. presidency, but not in the case of Volodymyr Zelensky and Vladimir Putin. The study was informed by time-series analysis and text mining. The outcomes of mining a unique quadrilingual corpus of political and media discourses on the war, containing 238 million words, were used as input in the ARIMA models, along with several other variables. The scope of the study includes five countries (Russia, Ukraine, the United States, the United Kingdom, and France). It covers the period from January 2022 to August 2025.
- New
- Research Article
- 10.1007/s11540-026-10064-5
- Apr 24, 2026
- Potato Research
- Harun Yonar + 1 more
Abstract This study systematically examines the thematic and conceptual evolution of the Potato Research journal between 1970 and 2024 using a multi-layered text mining and topic modelling framework. A total of 1967 articles were analyzed across titles, abstracts, and keywords to capture surface-level and latent themes. Word frequency analysis, trend analysis, co-occurrence networks, and thematic mapping were integrated with Latent Dirichlet Allocation (LDA) and Structural Topic Modeling (STM) to identify dominant research themes and to evaluate their temporal dynamics. In addition, Sustainable Development Goals (SDGs) mapping was conducted using three independent SDG identification frameworks to assess the alignment of potato research with global sustainability agendas. The findings reveal a clear transformation in the journal’s scientific orientation, shifting from an early focus on agronomic production and plant pathology toward sustainability-oriented, climate-resilient, and data-intensive research paradigms. LDA identified five core thematic domains, namely post-harvest pathology, genetic resistance and molecular breeding, abiotic stress and physiological responses, plant growth and productivity, and agricultural management, which were further validated through STM-based inferential analysis. Temporal trends indicate statistically significant increases in themes related to climate change, water management, food quality, and analytical modelling, alongside a relative decline in conventional agronomic practices. SDG mapping demonstrates strong alignment with SDG 2 (Zero Hunger), SDG 3 (Good Health and Well-Being), and particularly SDG 13 (Climate Action). The findings highlight the role of Potato Research as both a historical record of disciplinary development and a scientific publishing platform reflecting sustainability-oriented agricultural research.
- New
- Research Article
- 10.63313/ebm.9178
- Apr 23, 2026
- Economics & Business Management
- Bo Zhang + 1 more
In the context of digital reading and the growing importance of online reviews for publishing decisions, traditional topic planning faces challenges such as information overload and delayed feedback. This study analyzes over 59,000 valid reviews from Dangdang.com’s TOP15 best-selling literary books (April 2025). Using a “text mining + multi-dimensional comparison” approach and an optimized DTC framework with a topic mapping layer, we identify 17 core features of literary books via TF-IDF and K-means. Findings show that keywords like “classic,” “moving,” and “healing” significantly influence topic decisions and can be directly translated into content, author, and format strategies. The study validates the DTC model’s applicability in publishing and provides data-driven support for topic planning.
- New
- Research Article
- 10.3390/su18094177
- Apr 22, 2026
- Sustainability
- Qidi Dong + 4 more
In light of the accelerating process of global urbanization, the quality of cultural ecosystem services (CES) in urban parks has become a core metric for efforts to promote urban livability and sustainable cities. However, previous research has failed to consider the differential impacts of the external environment across various travel scenes. In this study, 32 parks in Chengdu serve as the empirical data, and public CES perception data are extracted from social media comments via text mining. Based on a unified 15 min time threshold, we delineate the service scope for four travel scenes and employ geographically weighted regression and piecewise regression models to analyze the spatial heterogeneity, driving mechanisms and threshold effects associated with the relationship between external environmental factors and park CES. The findings indicate that the external environment’s influence on CES exhibits a “scene-factor-scale” adaptation pattern. Walking scenes are influenced primarily by land-use and population factors; in contrast, cycling scenes rely on the availability of shared bicycle facilities, and public transport and driving scenes are driven by economic vitality and traffic-support factors, respectively. Five critical thresholds are identified, including a 40% impervious surface area. This research proposes scene-based optimization strategies and helps enhance the “external environment–travel behavior–spatial characteristics” coupling framework, thereby serving as a scientific reference for efforts to improve 15 min living circles.
- New
- Research Article
- 10.3390/virtualworlds5020019
- Apr 20, 2026
- Virtual Worlds
- Sunghae Jun
Extended reality (XR), encompassing augmented reality (AR), virtual reality (VR), and mixed reality (MR), is a key enabling technology for virtual worlds, and XR-related patents continue to grow rapidly. However, patent-based XR technology analysis faces a fundamental challenge: document–keyword matrix (DKM) built from patent titles and abstracts are typically high dimensional, sparse, and often exhibit excess zeros, which can distort inference when conventional text mining pipelines are applied without a generative count perspective. In this study, we propose a statistically grounded XR technology analysis framework that combines likelihood-based count modeling with interpretable structure mining to map XR sub-technologies from a patent DKM. Using an XR patent–keyword matrix, we fit Poisson regression (PR), negative binomial regression (NBR), and zero-inflated negative binomial regression (ZINBR) models via maximum likelihood estimation (MLE), controlling for document-length effects. Model selection by Akaike information criterion (AIC) consistently favored NBR for both target keywords, indicating substantial overdispersion in XR patent counts. We interpret exponentiated coefficients as incidence rate ratios (IRRs) and construct a technology relatedness network from significant IRR edges, revealing a dual-axis XR structure: reality is anchored in an AR or VR experience and content axis such as virtual and augment, whereas extend is embedded in a structure and integration axis for example, surface, edge, layer, and connectivity-related terms. To show how the proposed method can be applied to real domains, we searched the XR patent documents, and analyzed them for XR technology analysis.
- New
- Research Article
- 10.3390/data11040089
- Apr 20, 2026
- Data
- Jianfang Gao + 2 more
University data governance is an essential requirement for the informatization of universities and holds significant importance in advancing the modernization of university governance systems and governance capabilities. This study focuses on the data governance policies released by “Double First-Class” universities in China since 2015. Based on policy text mining and the PMC index model, the paper developed an evaluation system for university data governance policies consisting of 9 primary indicators and 43 secondary indicators and conducted quantitative assessment. The results indicate that the policies are of good quality overall, with 25% rated as excellent, 66.1% as good, and 8.9% as moderate. Many universities have made significant progress in formulating data governance policies. However, there is still considerable room for improvement. For example, while the policy objectives are clearly defined, certain aspects require further refinement; the stakeholder involvement is relatively narrow, lacking diversity; and the mix of policy instruments is imbalanced. To address these issues, it is recommended that policies be optimized by balancing regulatory priorities, establishing a multi-stakeholder collaborative governance framework, and rationalizing the policy instruments mix.
- Research Article
- 10.3390/nursrep16040142
- Apr 16, 2026
- Nursing reports (Pavia, Italy)
- Misa Iida + 5 more
Objective: This study aimed to compare factors facilitating shared decision-making (SDM) in renal replacement therapy decision support between physicians and nurses using text mining analysis. Methods: A web-based survey was conducted among 250 physicians and 299 nurses between December 2024 and March 2025. Free-text responses regarding factors facilitating SDM were collected and analyzed using quantitative text analysis. Results: Valid responses were obtained from 103 physicians and 122 nurses. Both groups identified six factors, with three shared conceptual domains across physicians and nurses, reflected in three physician factors and four nurse factors. Common domains included "promoting patient and family understanding", "enhancing staff education", and "strengthening multidisciplinary collaboration". Physicians emphasized structural and environmental factors, such as "establishing clinical systems", "inter-institutional collaboration", and "securing sufficient time". In contrast, nurses highlighted practical and interpersonal aspects, including "understanding patients' values and lifestyles", "supporting patient-centered decision-making", and "promoting team-based information sharing". Conclusions: Factors that facilitate SDM in renal replacement therapy include perspectives common to both physicians and nurses, as well as profession-specific perspectives. These findings suggest that integrating organizational support and clinical skills development is crucial for promoting SDM in clinical settings.
- Research Article
- 10.3390/su18083911
- Apr 15, 2026
- Sustainability
- Zhan Shi + 1 more
Against the backdrop of the comprehensive advancement of the law-based governance of China and the “dual carbon” strategic goals, existing research still lacks a systematic discussion on how corporate compliance management affects ESG performance, and few studies have constructed targeted compliance management indicators from a textual perspective. To fill this research gap, this paper aims to explore the influence of corporate compliance management on ESG performance. Using Chinese A-share listed firms on the Shanghai and Shenzhen Stock Exchanges from 2010 to 2023 as research samples, this study adopts text mining techniques, combined with a panel regression model and a mediating effect model, to construct an indicator of corporate compliance management and examine its impact on ESG performance. The empirical results show that compliance management significantly improves corporate ESG performance and functions mainly through three channels: promoting corporate green innovation, fostering corporate ethical culture, and reducing agency costs. Heterogeneity tests indicate that the positive relationship is more pronounced in state-owned enterprises and in firms with higher managerial power. Further analysis reveals that compliance management also helps reduce the divergence in ESG ratings among Chinese firms, and the construction of all dimensions of compliance management can contribute to the improvement of corporate ESG performance. These findings enrich the literature on the economic consequences of compliance management and the determinants of ESG performance and provide theoretical guidance for Chinese firms to enhance ESG performance via compliance management.
- Research Article
- 10.3390/oral6020044
- Apr 14, 2026
- Oral
- Jose Ramon Herrera + 7 more
Background/Objectives: Superficial oral mucosal (SOM) lesions are prevalent among patients with Sjögren’s disease (SjD) due to mucosal dryness. Given the limited evidence on screening and referral for SOMs, and the presence of relevant information only in dental clinical notes, a natural language processing (NLP) pipeline was developed to screen for SOMs among SjD patients. This retrospective study analyzed dental clinical notes from 180 linked electronic dental and health records, including both with and without a diagnosis of SjD. Materials and Methods: An annotation schema with four classes (SOMs, signs and symptoms of dry mouth, treatment for xerostomia, referral to specialists) was inductively created using the Extensible Human Oracle Suite of Tools (eHOST) to manually annotate clinical notes. Relevant keyterms were retrieved using a rule-based approach with Python’s Natural Language Toolkit (NLTK). SjD and control groups were compared using Fisher’s Exact tests. Four annotators reviewed ninety-three records. Results: SjD patients (mean age 54.8 ± 11.7 years) had fewer total visits across 15 years but had more dental visits per year (10.2 ± 13.3) than controls. SjD patients were more likely to have oral candidiasis (p = 0.041), exhibit signs and symptoms of dry mouth (p = 0.004), receive treatments for xerostomia (p < 0.001), be treated with cholinergic agonists (p = 0.005), and be referred to a specialist (p = 0.046), but findings were not significant for all SOMs. Additionally, SjD patients had a higher proportion of sialadenitis (p = 0.045), rheumatoid arthritis (p = 0.001), systemic lupus erythematosus (p < 0.001), myalgia/myositis/fibromyalgia (p = 0.010), and anxiety/nervousness (p = 0.004). Conclusions: These findings encourage the feasibility of using text mining from dental clinical notes for screening and management of oral conditions.
- Research Article
- 10.1111/ijd.70418
- Apr 14, 2026
- International journal of dermatology
- Lachlan D W Lau + 6 more
Artificial intelligence (AI) is being increasingly used in dermatology education and research as digital health data expands and large language models (LLMs) advance. This scoping review synthesized current applications, benefits, and limitations of AI in these domains. The review followed PRISMA-ScR methodology, including 102 studies published between 2010 and 2025, with 28 studies examining educational applications and 74 examining research applications. Educational applications included the use of LLMs for examination preparation, question and case generation, and image-based learning through generative and adaptive imaging tools. Research applications included machine learning and natural language processing for large-scale data analysis, pharmacovigilance, social media and clinical text mining, predictive modeling, biomarker and gene-signature discovery, and the use of LLMs to support literature synthesis, manuscript writing, and research workflow tasks. Across education and research, key limitations related to accuracy, bias, transparency, and ethical governance. These issues highlight the need for ongoing human oversight, the use of dermatology-specific training datasets, and structured implementation frameworks. Despite these considerations, AI has substantial potential to enhance dermatology learning and improve dermatologic research efficiency. Future work should focus on evaluating real-world performance, model reliability, and the effectiveness of human-AI collaboration in dermatology practice and training.
- Research Article
- 10.55041/ijcope.v2i4.313
- Apr 13, 2026
- International Journal of Creative and Open Research in Engineering and Management
- Sanjeeb Kumar Nayak Sanjeeb Kumar Nayak + 5 more
This paper presents a hybrid deep learning approach for personality trait classification from textual data. With the rapid growth of social media platforms, analyzing personality traits from user-generated text has become an important research area in natural language processing. The proposed system combines Convolutional Neural Networks (CNN) for effective feature extraction and Long Short-Term Memory (LSTM) networks for capturing contextual and sequential dependencies in text. The model utilizes TF-IDF for feature representation and is trained and evaluated on a labeled personality dataset based on standard personality traits. Extensive preprocessing techniques, including text cleaning, tokenization, and normalization, are applied to improve data quality and model performance. Experimental results show that the proposed CNN–LSTM model achieves an accuracy of 98%, outperforming traditional machine learning models such as Support Vector Machine (56%), Random Forest (53%), and K-Nearest Neighbors (31%). The improved performance of the hybrid model is attributed to its ability to learn both local semantic features and long-term contextual relationships in textual data. Furthermore, the model demonstrates strong generalization capability and robustness when applied to unseen data. The results indicate that the proposed approach is highly effective for real-world applications such as personalized recommendation systems, mental health analysis, user behavior prediction, and human-computer interaction. Keywords — Personality Trait Classification; Deep Learning; CNN-LSTM; Natural Language Processing; Text Mining; Machine Learning.
- Research Article
- 10.46647/icetetas173
- Apr 13, 2026
- Research Digest on Engineering Management and Social Innovations
- K Mudduswamy + 2 more
High-dimensional data clustering poses significant challenges due to the curse of dimensionality, noise, and sparsity. Traditional clustering algorithms often struggle with scalability and accuracy in such contexts. To address these issues, this paper proposes a hybrid clustering model that integrates Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Gaussian Mixture Models (GMM), leveraging the strengths of both approaches using machine learning techniques. DBSCAN efficiently identifies dense regions and eliminates noise, while GMM provides probabilistic soft clustering suitable for overlapping data distributions. By first using DBSCAN for pre-processing and noise reduction, followed by GMM for refined clustering, the hybrid model enhances performance in complex high-dimensional datasets. Dimensionality reduction techniques such as PCA and t-SNE are also incorporated to visualize and improve cluster quality. The proposed method is evaluated on benchmark high-dimensional datasets and compared against standalone clustering algorithms. Results demonstrate improved cluster compactness, separation, and computational efficiency, showcasing the effectiveness of the hybrid approach for high-dimensional data analysis in fields such as bioinformatics, image processing, and text mining.
- Research Article
- 10.3390/su18083842
- Apr 13, 2026
- Sustainability
- Nikolay Dragomirov + 2 more
This study rounds into both the historical context and future projections of sustainable supply chain research practices. It emphasizes the necessity for the advanced analyses of research articles by combining traditional analysis with modern topic modeling and forecasting techniques. This study is organized around four primary research questions. A dataset of n = 8955 indexed article keywords and abstracts for the period of 2000–2025 was analyzed in the Python (version 3.12.) environment using n-grams, top keywords by year, k-means clustering combined with dimensionality reduction, and co-occurrence networks. Time-series forecasting models were also used to project the short-term development of clusters. The dataset retrieval was performed with search string and subject-area filters to focus the analysis on managerial and economic perspectives of sustainable supply chains. The analysis identified four keyword clusters: (1) CSR and Stakeholder Engagement, (2) Circular Economy and Sustainable Production, (3) Decision-making, Resilience and Emerging Technologies, and (4) Green Supply Chain Management. These clusters were then examined to assess current research practices from a managerial and economics perspective and their near-term evolution, with results validated through the additional clustering of abstract-level topics. This study confirms a paradigm change toward the integration of circularity, digitalization, and resilience, with technology-enabled growth. Social sustainability remains underrepresented, revealing a critical gap in current research. This study contributes methodologically by updating and extending current research practices and theoretically by revealing sustainability problems trends in supply chains.
- Research Article
- 10.1080/00207543.2026.2652547
- Apr 9, 2026
- International Journal of Production Research
- Tao Zhu + 3 more
With the rapid rise of social e-commerce, an increasing number of consumers now purchase goods through social community networks. However, substitution effects arising from limited inventories and stockouts can seriously impair demand forecasting accuracy. To address this challenge, we propose XAI-Sub, an interpretable demand forecasting framework that systematically incorporates domain knowledge via text mining and biclustering to improve both transparency and predictive performance. By aggregating sparse sales records into substitution-aware clusters and estimating scenario-specific substitution matrices using a novel alternating minimisation algorithm, XAI-Sub explicitly quantifies substitutive relationships while preserving interpretability. Validated on data from a large community group-buying (CGB) platform, our framework achieves a relative improvement of 38% in forecasting accuracy compared with conventional methods. The approach introduces three key advancements in explainable artificial intelligence (XAI): (1) domain-knowledge anchoring: text-derived semantic features anchor substitution patterns within business logic, facilitating human-AI alignment; (2) scenario-driven interpretability: biclustering decomposes demand dynamics into actionable substitution typologies; and (3) causal pathway visualisation: the substitution matrix serves as an interpretable interface, delineating demand redistribution pathways across clusters. This work demonstrates how formalising domain knowledge can bridge the ‘explanation gap’ in complex demand systems and provides CGB enterprises with practical tools to audit and exploit substitution mechanisms.