Articles published on Latent Dirichlet allocation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
5229 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.ecoinf.2026.103725
- May 1, 2026
- Ecological Informatics
- Elina Takola
From LSA to LLM: Evolution and limitations of topic modelling methods for biodiversity conservation
- New
- Research Article
- 10.1016/j.amjoto.2026.104825
- May 1, 2026
- American journal of otolaryngology
- Wei Liu + 8 more
Malignant temporal bone tumors (1941-2025): A bibliometric analysis of publication trends, key contributors, and thematic evolution.
- New
- Research Article
- 10.1061/jcemd4.coeng-16965
- May 1, 2026
- Journal of Construction Engineering and Management
- Pan Zhang + 3 more
Prioritizing accurate cost contingency estimation through risk identification is essential for the success of construction projects. Traditional methods for identifying and classifying risk factors, such as workshops, interviews, and referencing similar projects, are predominantly manual, subjective, and time-consuming. To overcome these challenges, this study introduces a novel deep learning approach that leverages the BERTopic algorithm to extract cost-related risk factors from extensive project risk registers. The methodology consists of three key steps: (1) identifying risk factor topics; (2) visualizing topics, documents, and terms; and (3) revealing dynamic features of the topics. The effectiveness and practicality of this approach were demonstrated using risk register data from 277 public works projects in Hong Kong, with a comparative analysis against traditional topic modeling techniques, such as latent Dirichlet allocation (LDA) and Top2Vec. This analysis, validated by a panel of project planning experts, successfully identified critical cost-related risk factors, such as design changes, market conditions, project delays, and underground conditions. The findings offer valuable insight for project planners, enabling more effective assessment and prioritization of cost risk factors in future construction projects.
- New
- Research Article
- 10.1016/j.jss.2025.112748
- May 1, 2026
- Journal of Systems and Software
- Mumtahina Ahmed + 4 more
• Analyzed 25,302 questions on Mocking from StackOverflow. • Applied LDA for topic modelling and pyLDAvis for topic visualizations. • Identified 30 topics, performed categorization, constructed topic hierarchy. • Analyzed category and topic-wise question trends, question types, Q&A popularity and difficulty. Mocking is a common unit testing technique that is used to simplify tests, reduce flakiness, and improve coverage by replacing real dependencies with simplified implementations. Despite its widespread use in Open Source Software (OSS) projects, there is limited understanding of how and why developers use mocks and the challenges they face. In this study, we have analyzed 25,302 questions related to Mocking on StackOverflow to identify the challenges faced by developers. We have used Latent Dirichlet Allocation (LDA) for topic modeling, identified 30 key topics, and grouped the topics into five key categories. Consequently, we analyzed the annual and relative probabilities of each category to understand the evolution of mocking-related discussions. Trend analysis reveals that categories such as Mocking Techniques and External Services have remained consistently dominant, highlighting evolving developer priorities and ongoing technical challenges. While the questions on Theoretical category declined after 2010, posts regarding Error Handling grew notably from 2009. Our findings also show an inverse relationship between a topic’s popularity and its difficulty. Popular topics like Framework Selection tend to have lower difficulty and faster resolution times, while complex topics like HTTP Requests and Responses are more likely to remain unanswered and take longer to resolve. Additionally, we evaluated questions based on the answer status- successful, ordinary, or unsuccessful, and found that topics such as Framework Selection have higher success rates, whereas tool setup and Android-related issues are more often unresolved. A classification of questions into How, Why, What , and Other revealed that over 64 % are How questions, particularly in practical domains like file access, APIs, and databases, indicating a strong need for implementation guidance. Why questions are more prevalent in error-handling contexts, reflecting conceptual challenges in debugging, while What questions are rare and mostly tied to theoretical discussions. These insights offer valuable guidance for improving developer support, tooling, and educational content in the context of mocking and unit testing.
- New
- Research Article
- 10.1016/j.jafr.2026.102791
- May 1, 2026
- Journal of Agriculture and Food Research
- Sneha Pandey + 2 more
Rice is fundamental to global food and nutritional security, yet the evolution of rice-related research has not been systematically mapped at scale. This study applied an AI-driven bibliometric framework to 99,011 peer-reviewed articles indexed in Scopus (1970 to 2024), integrating natural language processing using term frequency-inverse document frequency (TF-IDF feature extraction) with topic modelling via Latent Dirichlet Allocation (LDA) and network analysis using graph-based clustering. This enabled both thematic structuring of research and identification of global collaboration patterns. Five dominant knowledge domains emerged: (1) soil contamination and heavy metal uptake in rice systems, (2) agricultural productivity and environmental impact, (3) nutritional and functional applications of rice by-products, (4) genotypic diversity and stress adaptation, and (5) genomic and molecular strategies for rice improvement. Temporal dynamics revealed a shift from agronomic yield and soil management research (1970s to 1990s) toward molecular genetics, stress resilience and environmental sustainability in the post-2000 era, with nutritional functionality and by-product utilization emerging only in the last decade. Collaboration mapping showed Asia being led by India, China and Japan as the primary research hubs, while Western institutions frequently connected regional clusters. Although progress was achieved, thematic compartmentalization remained, with limited interdisciplinary collaboration across molecular, agronomic and nutritional domains. By integrating machine learning (ML) and large-scale bibliometrics, this study provides the first systems-level evidence base of rice science, aimed at prioritizing areas for cross-disciplinary research and policy engagement to enhance and accelerate innovations towards resilient, sustainable and nutrition-sensitive food systems.
- New
- Research Article
- 10.1016/j.jrurstud.2026.104086
- May 1, 2026
- Journal of Rural Studies
- Xiangyu Li + 1 more
Remote rural communities often remain trapped in asset-based poverty because rural land functions as “dead capital” that cannot be easily monetized for more profitable uses. One potential solution is transferable development rights (TDR), a market-based redistribution instrument that monetizes rural development quotas and channels part of urban expansion gains to disadvantaged rural areas. Yet evidence on whether TDR alleviates poverty is mixed, and prior research has emphasized material outcomes while paying less attention to the social and political processes that generate unequal outcomes and to spatial heterogeneity within rural areas. We therefore apply a trivalent spatial justice framework—distributive, procedural, and recognitional justice—to assess China's TDR and explain why impacts differ between remote hinterland and peri-urban sending areas. By applying Latent Dirichlet Allocation topic modeling and spatial analysis to examine online citizen–government interactions from a Chinese participatory platform, we find that while TDR programs provide short-term economic gains for rural residents, these gains are frequently offset by longer-term livelihood losses. Procedural and recognitional injustices are central: a government-centered alliance marginalizes farmers' voices, while relocation reshapes landscapes, erodes rural culture, and reproduces discrimination. Moreover, these justice outcomes are spatially uneven—peri-urban areas exhibit stronger rights-claiming capacity and relatively better distributive outcomes, whereas remote areas face deeper constraints and greater livelihood risks. We conclude that poverty reduction cannot rely on land reform alone. The path to revitalizing the countryside lies in institutional reforms, particularly in rural political governance and the empowerment of rural communities. • This study uses a spatial justice framework to assess the effectiveness of transferable development rights on revitalizing rural land in Guangdong Province, China. • We apply a topic modeling algorithm to analyze citizen-government interactions on an online participation platform. • The transferable development rights program often delivers short-term monetary compensation, yet is frequently associated with under-cultivated/idle land and longer-term livelihood insecurity. • Rural land reform-oriented solutions to poverty alleviation must involve institutional reforms, particularly in rural political governance and the empowerment of local communities. • Justice outcomes are spatially uneven. Remote hinterlands face structural constraints and suffer from long-term livelihood losses, whereas peri-urban areas benefit from higher administrative capacity, thereby securing relatively better distributive outcomes.
- New
- Research Article
- 10.22266/ijies2026.0430.13
- Apr 30, 2026
- International Journal of Intelligent Engineering and Systems
Traditional topic modeling methods, such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), are limited by their context ignorance, static nature, and low interpretability.Building upon the hybrid approach LDA+NMF+class-based Term Frequency-Inverse Document Frequency (c-TF-IDF), a new formalized framework -Dynamic Contextual Topic Modeling with Large Language Model (LLM) Refinement (DCTM-LLM) -is presented.This LLM-refined framework integrates transformer embeddings for the detection of dynamic semantic clusters and leverages an LLM for their subsequent refinement and the synthesis of high-level narratives.Experiments on a corpus of 35,000 arXiv abstracts (cs.AI (Computer Science -Artificial Intelligence), 2015-2025) showed that DCTM-LLM achieves a Normalized Pointwise Mutual Information (NPMI) of 0.53, a Silhouette score of 0.62, an Adjusted Rand Index (ARI) of 0.55, and Topic Diversity at 10 of 0.88.Crucially, with a Bidirectional Encoder Representations from Transformers (BERT)-based score (BERTScore) F1 of 0.89, the method significantly outperforms Dynamic BERTopic (0.62) and the hybrid LDA, NMF, and c-TF-IDF approach (0.65).Thus, the proposed approach shifts the paradigm of topic modeling from keyword extraction toward automated knowledge synthesis.
- New
- Research Article
- 10.55041/ijsrem61408
- Apr 27, 2026
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Ramveer Singh + 4 more
Abstract - Online platforms such as YouTube generate vast volumes of user-generated textual content that reflect public opinion, emotions, and behavioural patterns. Analysing this content is essential for understanding audience sentiment, dominant discussion themes, and the presence of toxic or harmful speech. Existing Natural Language Processing (NLP) approaches typically address sentiment analysis, topic modeling, and toxicity detection as independent tasks, results in fragmented insights and increased system complexity. This paper presents The Public Pulse, an integrated and interpretable NLP framework that unifies sentiment analysis, topic modeling, and toxicity detection within a single analytical pipeline. The system processes YouTube comments using lexicon-based sentiment analysis, Latent Dirichlet Allocation (LDA) for topic extraction, and rule-based toxicity detection, with results visualised through an interactive Streamlit dashboard. Experimental results demonstrate that the proposed approach provides coherent insights into public discourse while remaining computationally efficient and suitable for academic and resource constrained environments. Key Words: Sentiment Analysis, Topic Modeling, Toxicity Detection, YouTube Comments, NLP, LDA, TextBlob
- New
- Research Article
- 10.1007/s11540-026-10064-5
- Apr 24, 2026
- Potato Research
- Harun Yonar + 1 more
Abstract This study systematically examines the thematic and conceptual evolution of the Potato Research journal between 1970 and 2024 using a multi-layered text mining and topic modelling framework. A total of 1967 articles were analyzed across titles, abstracts, and keywords to capture surface-level and latent themes. Word frequency analysis, trend analysis, co-occurrence networks, and thematic mapping were integrated with Latent Dirichlet Allocation (LDA) and Structural Topic Modeling (STM) to identify dominant research themes and to evaluate their temporal dynamics. In addition, Sustainable Development Goals (SDGs) mapping was conducted using three independent SDG identification frameworks to assess the alignment of potato research with global sustainability agendas. The findings reveal a clear transformation in the journal’s scientific orientation, shifting from an early focus on agronomic production and plant pathology toward sustainability-oriented, climate-resilient, and data-intensive research paradigms. LDA identified five core thematic domains, namely post-harvest pathology, genetic resistance and molecular breeding, abiotic stress and physiological responses, plant growth and productivity, and agricultural management, which were further validated through STM-based inferential analysis. Temporal trends indicate statistically significant increases in themes related to climate change, water management, food quality, and analytical modelling, alongside a relative decline in conventional agronomic practices. SDG mapping demonstrates strong alignment with SDG 2 (Zero Hunger), SDG 3 (Good Health and Well-Being), and particularly SDG 13 (Climate Action). The findings highlight the role of Potato Research as both a historical record of disciplinary development and a scientific publishing platform reflecting sustainability-oriented agricultural research.
- New
- Research Article
- 10.1038/s41598-026-48162-6
- Apr 24, 2026
- Scientific reports
- Qi Shasha
Understanding consumer behavior in the context of online shopping is critical for businesses to adapt to evolving market trends. Customer reviews serve as a rich source of information reflecting consumer sentiments and preferences. Sentiment analysis of these reviews has become a powerful tool to uncover underlying consumer emotions and purchasing trends. However, traditional methods relying on shallow lexical features and classical machine learning algorithms often fall short in capturing the intricate and contextual patterns present in textual data. In this study, we propose the use of the large language model RoBERTa-Large to enhance sentiment classification performance by imposing its advanced contextual embeddings and attention mechanisms. This approach enables the capture of complex semantic relationships beyond surface-level word frequencies. Alongside sentiment analysis, we apply topic modeling using Latent Dirichlet Allocation (LDA) on publicly available datasets to identify prevalent themes and topics within consumer feedback. We perform a comprehensive comparison of RoBERTa against traditional machine learning and ensemble models using TF-IDF features, as well as deep learning architectures utilizing sentence embeddings and transformer-based models. Experimental results demonstrate that RoBERTa-Large achieves the highest accuracy of 93.59%, significantly outperforming baseline models. To enhance model transparency and trustworthiness, we apply SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) interpretability techniques, providing meaningful explanations of model predictions at both global and local levels.
- New
- Research Article
- 10.3390/oceans7030037
- Apr 24, 2026
- Oceans
- Dimitris Klaoudatos + 3 more
The Mediterranean Sea is both a global biodiversity hotspot and the world’s most heavily invaded marine region, where non-indigenous species arrivals are accelerating under intensifying shipping, Suez Canal traffic, aquaculture, and climate warming. Yet, despite rapidly growing research activity, a comprehensive synthesis of the scientific literature on Mediterranean marine invasions has been lacking. This study provides the first Mediterranean-wide combined bibliometric and topic-modeling analysis of invasive marine species research, using 3521 unique documents retrieved from Scopus and Web of Science. We quantify temporal growth in publications and citations, map the conceptual structure of the field through co-citation, co-word, and topic modeling, and reveal pronounced regional and thematic biases. Latent Dirichlet Allocation resolves 13 coherent topics, dominated by first records of non-native species, invasive macroalgae, alien species diversity, and ecological impacts, with strong signals for Lessepsian migration and climate-driven range shifts, particularly in the Eastern Mediterranean. Spatial and thematic analyses reveal pronounced regional biases, with invasion hotspots in the Aegean and Levantine seas contrasted by comparatively sparse coverage of western and central sub-basins, and notable gaps in predictive modeling and socioeconomic assessments. The results underscore the need to rebalance effort toward under-studied regions and themes, while leveraging existing collaboration networks and methodological advances to support MSFD (Marine Strategy Framework Directive) implementation, International Maritime Organization (IMO) instruments, and broader ecosystem-based management. The reproducible framework presented here offers a baseline for periodically tracking research evolution and guiding adaptive, transboundary governance of Mediterranean marine bio-invasions.
- New
- Research Article
- 10.55041/isjem.acme057
- Apr 21, 2026
- International Scientific Journal of Engineering and Management
- Vathaluru Seshannagari Himabindhu + 4 more
The Dark Web is a hidden portion of the internet accessible only via specialized software like Tor, offering anonymity for both legal privacy needs and illegal activities such as drug sales and hacking forums. It serves as an anonymous haven for cyber threats including malware trading, hacking forums, and illicit marketplaces, complicating textual classification amid noisy, voluminous data. Existing methods integrate Latent Dirichlet Allocation (LDA) topic modeling weights with TextCNN, preprocessing Dark Web texts to derive class-specific keywords, slashing vector dimensions by approximately 300- fold for superior accuracy on DUTA-10k (25 classes) and CoDA (10 classes) over SVM, Naive Bayes, and prior benchmarks. Despite outperforming baselines, limitations persist: dependency on static datasets neglects dynamic content shifts; variable keyword tuning arises from class overlaps; real-time processing is absent; and separate components obscure neural interpretability. This paper proposes a unified deep learning architecture embedding topic modeling directly into TextCNN for real-time classification, dynamically pruning irrelevant terms while exposing neural influences via integrated keyword analysis. Key benefits include rapid threat detection for operational cybersecurity, enhanced explainability bridging probabilistic weights and deep features, reduced hyperparameter sensitivity for robust generalization, and scalable deployment across evolving Dark Web landscapes, advancing automated intelligence gathering. Key Words: Dark Web, Latent Dirichlet Allocation (LDA), real-time classification, generalization, TextCNN, operational cybersecurity.
- New
- Research Article
- 10.35799/jis.v26i1.67193
- Apr 20, 2026
- Jurnal Ilmiah Sains
- Agatha Marilin Saekoko + 2 more
Healthcare services constitute a crucial aspect in improving public well-being. Every individual has the right to receive healthcare services that are of high quality, safe, efficient, and affordable. This study aims to identify and analyze public perceptions and sentiments toward healthcare services at RSUD Soe, as well as to evaluate the performance of several machine learning methods in classifying such sentiments. The data were collected from 278 respondents through a Likert-scale questionnaire that represents perceptions and levels of satisfaction regarding various service aspects. Sentiment analysis was conducted using four machine learning algorithms, namely Naïve Bayes, C4.5, Random Forest, and Support Vector Machine. The results indicate that Naïve Bayes achieved the highest accuracy of 82.14 percent, followed by SVM at 80 percent, Random Forest at 79 percent, and C4.5 at 73.21 percent. This study also applied the Latent Dirichlet Allocation method to identify the main themes within public feedback. LDA generated twelve topics reflecting key issues such as waiting time, availability of medical personnel, facility cleanliness, and the attitudes of healthcare staff. The majority of comments exhibited positive sentiment, particularly concerning staff friendliness and service quality. These findings were used to formulate improvement recommendations, including enhancing service quality, increasing the number of medical personnel, and optimizing facilities. This research demonstrates that a data-driven quantitative approach is effective in evaluating healthcare service quality and supporting more targeted decision-making. The results are expected to assist RSUD Soe in continuously and effectively improving service quality.
- New
- Research Article
- 10.3390/app16083884
- Apr 16, 2026
- Applied Sciences
- Song Song + 2 more
In the context of the aesthetic economy and the rapid development of digital intelligence, product design is increasingly required to address not only functional performance but also users’ emotional needs. However, due to the ambiguity and subjectivity of perceptual requirements, it remains difficult to accurately translate user emotions into specific design solutions. To address this challenge, this study proposes an integrated Kansei Engineering–machine learning framework for optimizing product design. First, user perceptual data are collected through questionnaires and interviews, and key perceptual imagery words are extracted using the Latent Dirichlet Allocation (LDA) model and factor analysis. Then, product design elements are systematically decomposed, and their relative importance is determined using the fuzzy analytic hierarchy process (FAHP). Based on this, a mapping relationship between perceptual imagery and design elements is established. Subsequently, the XGBoost model is employed to predict and optimize design element combinations. The optimized design schemes are further generated using AIGC technology and validated through eye-tracking experiments and subjective evaluations.The results show that the proposed method achieves high predictive accuracy (R² = 0.87) and significantly improves the emotional expression of product design. This study contributes to the integration of Kansei Engineering and machine learning by providing a data-driven approach for emotional design optimization, offering theoretical, practical, and strategic guidance for intelligent product design in industrial contexts.
- Research Article
- 10.3390/info17040367
- Apr 14, 2026
- Information
- Luis Omar Colombo-Mendoza + 3 more
This article introduces the CoLiRa (Computational Literature Review & Analysis) framework, a novel integration of established computational algorithms designed to quantitatively analyze and map the evolution of scientific fields. Employing a human-in-the-loop epistemological approach, CoLiRa combines the scalability of automated algorithms with the semantic coherence of expert-driven qualitative research. The multi-stage pipeline incorporates Latent Dirichlet Allocation (LDA) for thematic discovery, cluster analysis (K-Means and Multidimensional Scaling) for conceptual mapping, and Ordinary Least Squares (OLS) regression to monitor temporal trends. Algorithmic outputs are structurally validated by domain experts using quantitative metrics. The framework’s end-to-end capabilities are demonstrated through a proof-of-concept case study on the semantic enrichment of tabular data, encompassing studies up to 2024 that utilize Semantic Web ontologies, Linked Data, and knowledge graphs. The analysis identifies three core research topics and finds no statistically significant linear trends, suggesting thematic coexistence. This work provides a validated, hybrid computational approach for conducting robust literature reviews and mapping research trajectories.
- Research Article
- 10.32877/bt.v8i3.3721
- Apr 10, 2026
- bit-Tech
- Alif Nuryana + 1 more
The rapid growth of scientific output in institutional repositories has created significant challenges for the efficient retrieval of information, particularly when searches rely solely on unstructured metadata. Although topic modelling has been widely applied to large bodies of text, little attention has been given to Indonesian-language repositories and metadata-only datasets harvested through standardized protocols. This study aims to address this issue by using Latent Dirichlet Allocation (LDA) to analyze the research landscape of the Widyatama University Repository, based on titles and abstracts that were collected automatically via the OAI-PMH protocol. The proposed methodology integrates the following processes: automated metadata harvesting; Indonesian-language text preprocessing; probabilistic topic modelling; and quantitative evaluation using coherence metrics, complemented by qualitative interpretability analysis. The experimental results show that the optimal model was achieved with 12 topics, giving a Coherence Score of 0.5546 categorized as 'Good'. This demonstrates that meaningful thematic structures can be extracted even from limited textual metadata. The identified topics reflect the university's main research areas, such as Marketing Management (12.5%), Auditing (12.4%), and Human Resource Management (12.1%), as well as specific domains like Informatics (6.7%). To enhance practical usability, the model outputs were deployed in an interactive, Streamlit-based dashboard enabling dynamic exploration of topic relationships and temporal trends. This study contributes to repository analytics by demonstrating how topic modelling driven by metadata can transform institutional repositories into intelligent systems for discovering knowledge, supporting the navigation of research, landscape analysis and evidence-based decision-making for academic management.
- Research Article
- 10.2196/80824
- Apr 7, 2026
- Online journal of public health informatics
- Danielle Hutchinson + 5 more
Public opinion, which may be influenced by personal experiences, news, and social media, can impact compliance with public health measures (PHMs) during health emergencies. Artificial intelligence (AI) tools offer opportunities to analyze public opinion in real time during health emergencies. However, their performance in accurately identifying sentiment and themes in health-related online content remains unclear. This study aimed to evaluate the performance of natural language processing-based and large language model (LLM)-based AI tools when compared to human coding for sentiment analysis, topic modeling, and thematic analysis of public health datasets. Tools were selected to reflect those available to public health analysts and decision-makers. Data were collected via Google Alerts (GA) and social media posts from X (formerly known as Twitter) relevant to COVID-19 mitigation PHMs from December 2022 to February 2023. Following relevance screening, the sentiment of the complete datasets was analyzed by a human rater, with descriptive statistics used to summarize the overall sentiment profile. Subsets of 400 GA articles and 400 tweets were manually coded for sentiment by 2 human raters. Results were compared with outputs from 5 AI tools, including VADER (Valence Aware Dictionary and Sentiment Reasoner), SentimentGI, SentimentQDAP, Microsoft Azure, and OpenAI's ChatGPT-4. Topic modeling of the GA and X datasets was conducted using latent Dirichlet allocation in R and zero-shot prompting in ChatGPT-4 and compared with manual topic summaries. Thematic analysis of positive and negative sentiment datasets was conducted by a human rater and ChatGPT-4, with outputs evaluated for proficiency and reasonableness. The sentiment of the entire datasets was analyzed by a human rater, and descriptive statistics were calculated. Of 2227 GA results and 3484 tweets, 58% (n=1238) and 71% (n=2473), respectively, were relevant to PHMs. Human-coded sentiment analysis showed mostly neutral reporting in the news media, while social media expressed more polarized views. Across both datasets, AI tools demonstrated poor concordance with human-coded sentiment (Cohen κ <0.5 for all tools and sentiment categories). Topic modeling with ChatGPT-4 aligned more closely with human-rated topics than latent Dirichlet allocation, and of the 20 LLM-generated thematic outputs, 13 were rated proficient, and 7 were rated partially proficient. LLM outputs provided coherent, high-level summaries but lacked contextual insight. Human and LLM thematic analyses both identified themes of vaccine effectiveness, debate regarding PHMs, and public trust. Accessible AI tools demonstrate limited reliability for sentiment classification of health-related online text but show promise for rapid thematic exploration when combined with human oversight. These tools could complement traditional qualitative research in the context of health emergencies; however, they require human review to enhance the accuracy of interpretation. Further research is needed for non-English datasets.
- Research Article
- 10.32819/202602
- Apr 5, 2026
- Agrology
- T Chetvertak + 3 more
Abstract. The destruction of the Kahovka Dam in June 2023 caused one of the largest ecological disasters in contemporary Europe and triggered profound hydrological, geomorphological, and biotic transformations across the Lower Dnipro region. This catastrophe created an urgent need for practical ecological research and for broader theoretical generalization within the field of catastrophic ecology. The objective of this study was to determine how catastrophic ecosystem change is structured in the international scientific literature and to clarify the conceptual position of the Kahovka catastrophe within this broader research landscape. A bibliographic corpus of publications on catastrophic changes in ecosystems was analyzed using text mining of titles, abstracts, and keywords. After preprocessing and filtering of lexical units, Latent Dirichlet Allocation was applied to identify the main thematic blocks of the literature. Their relationships were examined using Multidimensional Scaling, while conceptual structuring was assessed through citation analysis using generalized linear modeling and topic-specific distances from thematic centroids. The analysis revealed five major thematic directions: catastrophic shifts of ecosystems, climate change as a cause of ecosystem catastrophes, habitat extinction in the context of ecosystem catastrophes, catastrophic disturbance and forest ecosystem reorganization, and catastrophic change in aquatic and riparian ecosystems. The multidimensional configuration showed that catastrophic ecology is organized as a differentiated semantic field rather than as a single homogeneous discourse. The climate-change and aquatic-riparian blocks demonstrated the strongest increase in prominence over time, whereas the disturbance block showed a gradual decline. At the same time, the catastrophic-shift and disturbance themes displayed clearer conceptual structuring, because citation-effective publications were concentrated closer to their thematic centroids. These findings indicate that catastrophic ecology combines a central theoretical discourse on resilience, thresholds, and regime shifts with several partially autonomous applied domains related to climate, biodiversity loss, disturbance, and water-related transformations. In this perspective, the Kahovka catastrophe can be understood as a contemporary large-scale case that connects several of these semantic directions simultaneously and therefore provides an important basis for the further development of catastrophic ecology as an empirical and theoretical field.
- Research Article
- 10.1080/13504851.2026.2654786
- Apr 5, 2026
- Applied Economics Letters
- Wenqi Li + 2 more
ABSTRACT Accurately forecasting carbon emission allowance (CEA) prices is essential for supporting China’s low-carbon transition, yet the national carbon market remains volatile and highly sensitive to policy signals. This study develops a prediction framework that incorporates topic information and sentiment measures by integrating unstructured news reports with historical price data. We first identify thematic structures in carbon-related news using four large language models (LLMs), which cover two categories emphasizing deep reasoning capability and rapid response, respectively, along with Latent Dirichlet Allocation (LDA), and then extract topic-specific sentiment using the same LLMs and a traditional lexicon-based sentiment approach. These sentiment indicators are incorporated into a long short term memory (LSTM) forecasting model, and structural break analysis is employed to capture regime shifts in market dynamics. Empirical results show that sentiment augmented models consistently outperform price-only benchmarks. SHapley Additive exPlanations (SHAP) analysis further indicates that discourse related to market trading contributes substantially to predictive improvement.
- Research Article
- 10.1111/ajfs.70045
- Apr 4, 2026
- Asia-Pacific Journal of Financial Studies
- Sungju Yang + 1 more
Abstract This study develops an explainable machine learning model to predict cryptocurrency delistings using Binance data. It combines quantitative indicators (price, volume) with qualitative data from real‐time news and Reddit. Latent Dirichlet Allocation (LDA) is used to extract topic trends and community reactions, which are transformed into time‐series features. XGBoost, LightGBM, and CatBoost are compared, with SHAP applied for model interpretability. Results show that sharp price drops, repeated risk‐topic exposure, and Reddit responses strongly predict delisting. XGBoost achieves the best performance, offering practical insights for early warning systems and investor protection.