- New
- Research Article
- 10.1186/s40537-025-01337-w
- Jan 6, 2026
- Journal of Big Data
- Ali Nazari + 1 more
Abstract Organizations face growing challenges in deriving meaningful insights from vast amounts of specialized text data. Conventional topic modeling techniques are typically static and unsupervised, making them ill-suited for fast-evolving fields like quantum cryptography. These models lack contextual awareness and cannot easily incorporate emerging expert knowledge or subtle shifts in subdomains. Moreover, they often overlook rare but meaningful terms, limiting their ability to surface early signals or align with expert-driven insights essential for strategic understanding. To tackle these gaps, we employ design science research methodology to create a framework that enhances topic modeling by weighting aspects based on expert-informed input. It combines expert-curated keywords with topic distributions iteratively to improve topic relevance and document alignment accuracy in specialized research areas. The framework comprises four phases, including (1) initial topic modeling, (2) expert-informed aspect definition, (3) supervised document alignment using cosine similarity, and (4) iterative refinement until convergence. Applied to quantum communication research, this method improved the visibility of critical but low-frequency terms. It also enhanced topic coherence and aligned topics with the cryptographic priorities identified by experts. Compared to the baseline model, this framework increased intra-cluster similarity. It reclassified a substantial portion of documents into more thematically accurate clusters. Evaluating QCrypt 2023 and 2024 conference papers showed that the model adapts well to changing discussions, marking a shift from theoretical foundations to implementation challenges. This study illustrates that expert-guided, aspect-weighted topic modeling boosts interpretability and adaptability. It also enhances strategic relevance, providing a scalable solution for tracking knowledge in fast-paced research domains.
- New
- Research Article
- 10.1186/s40537-025-01352-x
- Jan 4, 2026
- Journal of Big Data
- Doyoung Kim + 6 more
- New
- Research Article
- 10.1186/s40537-025-01339-8
- Jan 4, 2026
- Journal of Big Data
- Haoran Ding + 1 more
- New
- Research Article
- 10.1186/s40537-025-01324-1
- Dec 30, 2025
- Journal of Big Data
- Ghadeer Yousef + 1 more
- New
- Research Article
- 10.1186/s40537-025-01325-0
- Dec 23, 2025
- Journal of Big Data
- Zhenjie Yao + 5 more
- New
- Research Article
- 10.1186/s40537-025-01183-w
- Dec 23, 2025
- Journal of Big Data
- Ana M Maitin + 3 more
- New
- Research Article
- 10.1186/s40537-025-01345-w
- Dec 19, 2025
- Journal of Big Data
- Jui-Sheng Chou + 1 more
Abstract Effective risk management is crucial in the construction industry, which has a substantial economic impact but is vulnerable to high financial risks due to volatile material costs and complex project-based financial structures. This study presents a new hybrid model to improve the prediction of financial distress for Taiwanese-listed construction companies. The research compares four boosting-based ensemble learning models, advanced deep learning models, and improved ensemble models that incorporate a novel approach using the Multi-Criteria Decision-Making (MCDM) technique, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), to enhance feature selection. Experimental results show that while TOPSIS-eXtreme Gradient Boosting (TOPSIS-XGBoost) is highly effective at managing imbalanced financial datasets, Light Gradient Boosting Machine (LightGBM) performs better in balanced environments. Both models exhibit substantial performance gains when integrated with the Forensic-Based Investigation (FBI) optimization algorithm, resulting in the optimized hybrids—FBI-TOPSIS-XGBoost and FBI-LightGBM—which achieve marked improvements in predictive accuracy. These optimized models consistently outperform benchmark approaches, including the Altman Z-score, Zmijewski X-score, Logistic Regression, and Random Forest, across multiple evaluation metrics. To enhance transparency and interpretability, a global SHapley Additive exPlanations (SHAP) analysis was conducted, revealing that profitability and per-share index indicators are the primary determinants driving model predictions. Additionally, an expert system interface has been developed to enhance the practical usability of these models. These findings strengthen the methodological foundation for predicting financial distress and provide stakeholders with valuable tools for mitigating risk in Taiwan’s construction industry.
- New
- Research Article
- 10.1186/s40537-025-01330-3
- Dec 19, 2025
- Journal of Big Data
- Shafique Ahmed Awan + 7 more
- Research Article
- 10.1186/s40537-025-01302-7
- Dec 17, 2025
- Journal of Big Data
- Michał Zarȩba + 4 more
Abstract Analyzing defensive actions, which have traditionally received less attention than offensive metrics, is a significant challenge in football analytics. This research presents an innovative methodology that utilizes XGBoost and deep neural networks to evaluate defensive performance using metrics such as On-Ball Value (OBV), Valuing Actions by Estimating Probabilities (VAEP), and eXpected Threat (xT). The study proposes a machine learning-based framework for evaluating the value of defensive players. A case study using expert ratings and market values from the Polish PKO BP Ekstraklasa demonstrates the method’s effectiveness. The results advance the field of sports analytics by addressing the persistent problem of accurately valuing the defensive contributions of football players.
- Research Article
- 10.1186/s40537-025-01329-w
- Dec 16, 2025
- Journal of Big Data
- Laiba Sabir + 5 more