Decoding public sentiment on pension policies in China through natural language processing
This study aims to reveal public sentiment toward China’s pension policies from January 2018 to August 2023, leveraging over 260,000 Weibo posts to identify key themes and demographic differences. Advanced Natural Language Processing (NLP) techniques, including sentiment analysis and latent Dirichlet allocation, are employed to explore six topics, such as societal impact and policy integrity, while uncovering demographic and regional variations. The findings reveal that policy changes significantly influence public sentiment, with greater negativity observed around institutional and structural aspects of the policies. These results underscore the need for public education on pension reforms and fraud prevention, providing actionable insights for policymakers in an aging society. The study contributes to behavioural finance theory by illustrating how heuristics like availability bias and loss aversion shape public reactions to pension reforms. However, social media data may not fully represent less active groups like older adults, highlighting the need for broader research methods.
- Research Article
3
- 10.1155/jonm/2857497
- Jan 1, 2024
- Journal of nursing management
Aim: This scoping review aimed to identify and synthesize the evidence in existing nursing studies that used natural language processing to analyze social media data, and the relevant procedures, techniques, tools, and ethical issues. Background: Social media has widely integrated into both everyday life and the nursing profession, resulting in the accumulation of extensive nursing-related social media data. The analysis of such data facilitates the generation of evidence thereby aiding in the formation of better policies. Natural language processing has emerged as a promising methodology for analyzing social media data in the field of nursing. However, the extent of natural language processing applications in analyzing nursing-related social media data remains unknown. Evaluation: A scoping review was conducted. PubMed, CINAHL, Web of Science and IEEE Xplore were searched. Studies were screened based on inclusion criteria. Relevant data were extracted and summarized using a descriptive approach. Key Issues: In total, 38 studies were included for the final analysis. Topic modeling and sentiment analysis were the most frequently employed natural language processing techniques. The most used topic modeling algorithm was latent Dirichlet allocation. The dictionary-based approach was the most utilized sentiment analysis approach, and the National Research Council Sentiment and Emotion Lexicons was the most used sentiment dictionary. Natural language processing tools such as Python (NLTK, Jieba, spaCy, and KoNLP library) and R (LDAvis, Jaccard, ldatuning, and SentiWordNet packages) were documented. A significant proportion of the included studies did not obtain ethical approval and did not conduct data anonymization on social media users' information. Conclusion: This scoping review summarized the extent of natural language processing techniques adoption in nursing and relevant procedures and tools, offering valuable resources for researchers who are interested in discovering knowledge from social media data. The study also highlighted that the application of natural language processing for analyzing nursing-related social media data is still emerging, indicating opportunities for future methodological improvements. Implications for Nursing Management: There is a need for a standardized management framework for conducting and reporting studies using natural language processing techniques in the analysis of nursing-related social media data. The findings could inform the development of regulatory policies by nursing authorities.
- Research Article
- 10.3389/frai.2025.1627078
- Sep 11, 2025
- Frontiers in Artificial Intelligence
IntroductionMental disorders are highly prevalent in modern society, leading to substantial personal and societal burdens. Among these, depression is one of the most common, often exacerbated by socioeconomic, clinical, and individual risk factors. With the rise of social media, user-generated content offers valuable opportunities for the early detection of mental disorders through computational approaches.MethodsThis study explores the early detection of depression using black-box machine learning (ML) models, including Support Vector Machines (SVM), Random Forests (RF), Extreme Gradient Boosting (XGB), and Artificial Neural Networks (ANN). Advanced Natural Language Processing (NLP) techniques TF-IDF, Latent Dirichlet Allocation (LDA), N-grams, Bag of Words (BoW), and GloVe embeddings were employed to extract linguistic and semantic features. To address the interpretability limitations of black-box models, Explainable AI (XAI) methods were integrated, specifically the Local Interpretable Model-Agnostic Explanations (LIME).ResultsExperimental findings demonstrate that SVM achieved the highest accuracy in detecting depression from social media data, outperforming RF and other models. The application of LIME enabled granular insights into model predictions, highlighting linguistic markers strongly aligned with established psychological research.DiscussionUnlike most prior studies that focus primarily on classification accuracy, this work emphasizes both predictive performance and interpretability. The integration of LIME not only enhanced transparency and interpretability but also improved the potential clinical trustworthiness of ML-based depression detection models.
- Research Article
14
- 10.3390/ijerph192013248
- Oct 14, 2022
- International journal of environmental research and public health
The COVID-19 pandemic has created unprecedented burdens on people’s health and subjective well-being. While countries around the world have established models to track and predict the affective states of COVID-19, identifying the topics of public discussion and sentiment evolution of the vaccine, particularly the differences in topics of concern between vaccine-support and vaccine-hesitant groups, remains scarce. Using social media data from the two years following the outbreak of COVID-19 (23 January 2020 to 23 January 2022), coupled with state-of-the-art natural language processing (NLP) techniques, we developed a public opinion analysis framework (BertFDA). First, using dynamic topic clustering on Weibo through the latent Dirichlet allocation (LDA) model, a total of 118 topics were generated in 24 months using 2,211,806 microblog posts. Second, by building an improved Bert pre-training model for sentiment classification, we provide evidence that public negative sentiment continued to decline in the early stages of COVID-19 vaccination. Third, by modeling and analyzing the microblog posts from the vaccine-support group and the vaccine-hesitant group, we discover that the vaccine-support group was more concerned about vaccine effectiveness and the reporting of news, reflecting greater group cohesion, whereas the vaccine-hesitant group was particularly concerned about the spread of coronavirus variants and vaccine side effects. Finally, we deployed different machine learning models to predict public opinion. Moreover, functional data analysis (FDA) is developed to build the functional sentiment curve, which can effectively capture the dynamic changes with the explicit function. This study can aid governments in developing effective interventions and education campaigns to boost vaccination rates.
- Research Article
- 10.1016/j.mex.2025.103407
- Jun 1, 2025
- MethodsX
Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.
- Research Article
- 10.1108/dprg-09-2024-0240
- Mar 3, 2025
- Digital Policy, Regulation and Governance
Purpose This study aims to analyze public discourse on decentralized finance (DeFi) and central bank digital currencies (CBDC) using advanced natural language processing (NLP) techniques to uncover key insights that can guide financial policy and innovation. This research seeks to fill the gap in the existing literature by applying state-of-the-art NLP models like BERT and RoBERTa to understand the evolving online discourse around DeFi and CBDC. Design/methodology/approach This study uses a multilabel classification using BERT and RoBERTa models alongside BERTopic for topic modeling. Data is collected from social media platforms, including Twitter and LinkedIn, as well as relevant documents, to analyze public sentiment and discourse. Model performance is evaluated based on accuracy, precision, recall and F1-scores. Findings RoBERTa outperforms BERT in classification accuracy and precision across all metrics, making it more effective in categorizing public discourse on DeFi and CBDC. BERTopic identifies five key topics frequently discussed, such as financial inclusion, competition and growth in DeFi, with important implications for policymakers. Practical implications The insights derived from this study provide valuable information for financial regulators and policymakers to develop more informed, data-driven strategies for implementing and regulating DeFi and CBDC. Public discourse analysis enables policymakers to understand emerging concerns and trends critical for crafting effective financial policies. Originality/value This study is among the first to use advanced NLP models, including RoBERTa and BERTopic, to analyze public discourse on DeFi and CBDC. It offers novel insights into the potential challenges and opportunities these innovations present. It contributes to the growing body of research on the intersection of digital financial technologies and public sentiment.
- Preprint Article
- 10.2196/preprints.72853
- Feb 20, 2025
BACKGROUND Unstructured patient feedback (UPF) allows patients to freely express their experiences without the constraints of predefined questions. The proliferation of online healthcare rating websites has created a vast source of UPF. Natural language processing (NLP) techniques, particularly sentiment analysis and topic modelling, are increasingly being used to analyse UPF in healthcare settings, however the scope and clinical relevance of these technologies is unclear. OBJECTIVE This scoping review investigates how NLP techniques are being used to interpret UPF, with focus on the healthcare settings in which this is used, the purposes for using these technologies, and any impacts reported on clinical practice. METHODS Searches of the MEDLINE, EMBASE, CINAHL, Cochrane Database of Reviews, and Google Scholar were conducted in February 2024. No date limits were applied. English language studies that used NLP techniques on UPF that pertained to an identifiable health care setting or provider were included. Data extraction focused on the healthcare setting, NLP methods used, and applications of these techniques. RESULTS 52 studies were included. NLP was most commonly applied to UPF from secondary care settings (n=33) with fewer in primary (n=10) or community (n=5) care. Three NLP techniques were identified in the included studies: sentiment analysis (n=32), topic modelling (n=15) and text classification (n=7). Sentiment analysis was applied to explore associations between patient sentiment and healthcare provider characteristics, track emotional responses over time, and identify areas for improvement in healthcare delivery. Topic modelling, primarily using Latent Dirichlet Allocation (LDA) algorithm, was employed to uncover latent themes in patient feedback, compare patient experiences across different healthcare settings, and track changes in patient concerns over time. Text classification was used to categorize patient feedback into predefined topics. The association between NLP-derived insights and traditional healthcare quality metrics was limited, with few studies describing concrete clinical impacts resulting from their analyses. CONCLUSIONS NLP has been applied to UPF across a number of contexts, primarily to identify features of health services or professionals that support good patient experience. The growth of research publications demonstrates an academic interest in these technologies, but there is little evidence these approaches are being employed in clinical settings. Future research is required to assess how NLP may capture the nuance of healthcare interactions, align with existing quality metrics and how it may be used to influence clinician behaviour
- Research Article
- 10.62131/mlaj-v3-n1-015
- Feb 11, 2025
- Multidisciplinary Latin American Journal (MLAJ)
The paper explores how financial decisions affect corporate financial flexibility, using advanced Natural Language Processing (NLP) techniques. The research analyzes scientific abstracts, collected from Scopus, to identify latent topics using the Latent Dirichlet Allocation (LDA) model. The results highlight five main topics: financial management, corporate governance, social inclusion, monetary policies, and adaptability. The first, dominant theme encompasses concepts such as investment planning and debt management, while the latter focuses on social and environmental interactions. Significant patterns are identified, such as the positive impact of flexibility in strategic investments and its relationship with dividend policies and quality financial reporting. However, excess liquidity may limit long-term profitability. This analysis reveals the growing interdisciplinarity in financial decisions, integrating technological, social and cultural factors. The applied methodology highlights the usefulness of LDA in synthesizing large volumes of information and delves into how financial flexibility mitigates risks and enhances opportunities in dynamic markets. The findings provide a solid foundation for future research and optimization of financial strategies.
- Research Article
1
- 10.1007/s11301-024-00479-0
- Dec 4, 2024
- Management Review Quarterly
Leadership is recognized as playing a crucial role in the organization’s performance and success. As a result, the scientific literature on leadership has become quite extensive, making it difficult to identify and understand the current state of research. Most literature studies focus on a specific aspect of the field or a limited time frame, providing a fragmented view of the overall landscape. Therefore, this research aims to provide new insights into the current state of research through two studies. Using advanced Natural Language Processing (NLP) techniques, the first study focuses on identifying emerging research trends in the field through a Latent Dirichlet Allocation (LDA) model, providing insights into future areas of interest and investigation. The second study centers on analyzing consolidated research patterns through co-word and network analysis, shedding light on the connections and interrelationships between leadership research topics. By applying these techniques to a comprehensive dataset of 56,547 research papers gathered from Web of Science and Scopus, this study provides a detailed understanding of the current state of leadership research and identifies potential areas for future exploration. Five research trends were identified: (1) Leadership and Digital Transformation Research (LDTR); (2) Leadership and Organizational Performance Research (LOPR); (3) Educational Leadership Research (ELR); (4) Leadership Practices and Development Research (LPDR); and (5) Gender and Diversity Leadership Research (GDLR). Combining these five research trends with the consolidated research patterns identified, we propose several research directions identified for advancing leadership studies.
- Research Article
8
- 10.2196/29768
- Nov 29, 2021
- JMIR Medical Informatics
BackgroundA new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature.ObjectiveThe objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII.MethodsWe constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites.ResultsOur pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants.ConclusionsOur study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses.
- Research Article
- 10.1158/1557-3265.adi21-po-092
- Mar 1, 2021
- Clinical Cancer Research
Introduction Seventy percent of lung cancer patients are diagnosed at advanced stages. Lung cancer screening (LCS) can potentially produce a stage-shift through early detection of the disease. The 2013 LCS guideline from the U.S. Preventive Services Task Force (USPSTF) recommended screening with low-dose computed tomography (LDCT) for individuals aged between 55 and 80 with 30 pack-year smoking history (i.e., current smoker or had quit smoking within 15 years). However, the high false-positive rate of LCS with LDCT is one of the concerns that hinders the uptake of LCS in real-world settings. An electronic health record (EHR)-based computable phenotyping (CP) algorithm that accurately identifies patients who meet the LCS eligibility criteria can potentially improve the reach of screening eligible population and thereby increase the uptake of LCS. Objective To develop an EHR-based CP algorithm to identify patients eligible for LCS. Method The LCS CP algorithm was developed to extract quantitative smoking information (i.e., pack-years, smoking years, quit year) using both structured EHR and unstructured clinical notes, enabled by advanced natural language processing (NLP) methods. The study cohort consisted of 3,080 patients who received LCS with LDCT based on procedure codes, as documented in EHR data from the UF Health Integrated Data Repository (IDR). The EHR-based LCS CP algorithm included two modules, one to extract smoking information from both structured EHR data and clinical notes using NLP techniques, and the other to integrate the extracted results based on the CP rules (e.g., pack-year > 30; quit year within 15 years; age 55-80) to determine whether a patient is eligible for LCS. For initial evaluation, we conducted a chart review of 20 randomly selected patients and compared the CP algorithm outcomes with the chart review results. Results and Discussion The manual chart review of the 20 patients who underwent LCS with LDCT identified 13 patients were qualified for LCS, 6 patients were not qualified for LCS, and 1 patient was undecidable. Based on this gold standard dataset, the CP algorithm achieved a specificity of 1.00 and a sensitivity of 0.92. Without smoking information extracted from clinical notes using NLP, the specificity score dropped to 0.80. Our results indicate that clinical notes are an important source of information on smoking histories. For all smoking-related information extracted from the clinical notes, smoking history was consistent with the structured EHR in 60% of cases, inconsistent in 10% cases, with the remaining 30% missing. Our results point to (1) suboptimal documentation of smoking information in EHRs, (2) added value of artificial intelligence methods such as NLP in improving CP performance, and (3) potential of an EHR-based CP to accurately identify patients eligible for LCS, and potential relevance to clinical decision support. As the upcoming USPSTF LCS guideline is changing (i.e., from 30 pack-year to 20 pack-year), the CP needs be refined to reflect the changes. Citation Format: Shuang Yang, Tianchen Lyu, Xi Yang, Yonghui Wu, Yi Guo, Michelle Alvarado, Hiren J. Mehta, Ramzi G. Salloum, Dejana Braithwaite, Jinhai Huo, Ya-Chen Tina Shih, Jiang Bian. Developing a computable phenotype to identify populations eligible/ineligible for lung cancer screening [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-092.
- Research Article
23
- 10.1016/j.healthplace.2023.102968
- Jan 1, 2023
- Health & Place
How the natural environment in downtown neighborhood affects physical activity and sentiment: Using social media data and machine learning.
- Research Article
- 10.3389/ebm.2025.10389
- Feb 28, 2025
- Experimental biology and medicine (Maywood, N.J.)
Topic modeling is a crucial technique in natural language processing (NLP), enabling the extraction of latent themes from large text corpora. Traditional topic modeling, such as Latent Dirichlet Allocation (LDA), faces limitations in capturing the semantic relationships in the text document although it has been widely applied in text mining. BERTopic, created in 2022, leveraged advances in deep learning and can capture the contextual relationships between words. In this work, we integrated Artificial Intelligence (AI) modules to LDA and BERTopic and provided a comprehensive comparison on the analysis of prescription opioid-related cardiovascular risks in women. Opioid use can increase the risk of cardiovascular problems in women such as arrhythmia, hypotension etc. 1,837 abstracts were retrieved and downloaded from PubMed as of April 2024 using three Medical Subject Headings (MeSH) words: "opioid," "cardiovascular," and "women." Machine Learning of Language Toolkit (MALLET) was employed for the implementation of LDA. BioBERT was used for document embedding in BERTopic. Eighteen was selected as the optimal topic number for MALLET and 23 for BERTopic. ChatGPT-4-Turbo was integrated to interpret and compare the results. The short descriptions created by ChatGPT for each topic from LDA and BERTopic were highly correlated, and the performance accuracies of LDA and BERTopic were similar as determined by expert manual reviews of the abstracts grouped by their predominant topics. The results of the t-SNE (t-distributed Stochastic Neighbor Embedding) plots showed that the clusters created from BERTopic were more compact and well-separated, representing improved coherence and distinctiveness between the topics. Our findings indicated that AI algorithms could augment both traditional and contemporary topic modeling techniques. In addition, BERTopic has the connection port for ChatGPT-4-Turbo or other large language models in its algorithm for automatic interpretation, while with LDA interpretation must be manually, and needs special procedures for data pre-processing and stop words exclusion. Therefore, while LDA remains valuable for large-scale text analysis with resource constraints, AI-assisted BERTopic offers significant advantages in providing the enhanced interpretability and the improved semantic coherence for extracting valuable insights from textual data.
- Research Article
4
- 10.1108/jhti-07-2023-0460
- Feb 2, 2024
- Journal of Hospitality and Tourism Insights
PurposeThis study aims to use the voice of the customer (VoC) strategy to collect user-generated content (UGC) compare customer expectations with reality, make the necessary improvements for the business and create personalized strategies for each customer to maximize revenue, focus on hospitality industry in Vietnam market.Design/methodology/approachThis study proposes a synthesis of techniques for a deep understanding of the VoC based on online reviews in the hospitality industry. First, 409,054 comments were collected from websites in the hospitality sector. Second, the data will be organized, stored, cleaned, analyzed and evaluated. Next, research using business intelligence (BI) solutions integrating three models, including net promoter score (NPS), graph model and latent Dirichlet allocation (LDA), based on natural language processing (NLP) technique, experiment on Vietnamese and English data to explore the multidimensional voice of customer’s row. Finally, a dashboard system will be implemented to visualize analysis results and recommendations on marketing strategies to improve product and service quality.FindingsExperimental results allow analysts and managers to “listen to the customer’s voice” accurately and effectively, identify relationships between entities, topics of discussion in favor of positive and negative trends.Originality/valueThe novelty in this study is the integration of three models, including NPS, graph model and LDA. These models are combined based on the BI solution and NLP technique. The study also conducted experiments on both Vietnamese and English languages, which ensures more effective practical application.
- Research Article
- 10.1007/s44212-025-00083-x
- Sep 30, 2025
- Urban Informatics
Urban planners routinely engage with extensive textual materials, such as zoning codes, comprehensive plans, and public comments. The large volume of textual data being generated in cities offers new opportunities to use emerging computational technologies to process and analyze textual data. In this research, we compare how natural language processing (NLP) techniques and large language models (LLMs) compare to human qualitative coding techniques to identify public sentiment and topics contained in public comments about the Minneapolis 2040 upzoning. We use a custom rubric developed in collaboration with urban planners to assess outputs across these different methods, scoring outputs on factors such as accuracy, convergence, creativity, efficiency, and interpretability. Additionally, we conduct interviews with practicing urban planners to understand their perceptions of integrating these computational techniques into their existing workflows. We find that using NLP techniques are helpful in providing urban planners with an aerial view of their data, but require additional human interpretation. In contrast, using LLMs markedly improves efficiency, interpretability, and descriptiveness over traditional NLP techniques, but requires human validation to address concerns related to social biases and equity. We further find that urban planners are open to using new text processing technologies, but have reservations about entirely outsourcing decision-making to AI tools, viewing AI technologies more as “co-pilots” rather than autonomous agents. Our findings underscore the importance of integrating human judgment into using computational tools to develop a more informed, equitable, and reflective practice in an era of expanding urban data and computational technologies.
- Research Article
- 10.1108/jap-07-2025-0028
- Nov 14, 2025
- The Journal of Adult Protection
Purpose This study aims to explore in depth the perceptions, attitudes and experiences of older adults residing in an institutional care facility regarding digital care technologies. Based on these findings, it seeks to develop age-sensitive, privacy-oriented and user-centered digital care policies. Design/methodology/approach A total of 43 participants were selected through purposive sampling from a professional elderly care center located in Narlidere, Izmir. Participants were introduced to IoT-based digital care systems through visual scenarios designed to simulate real-life situations. Data were collected via semi-structured, tablet-assisted personal interviews (TAPIs). In addition to thematic analysis, natural language processing (NLP) techniques, including sentiment analysis, Term Frequency-Inverse Document Frequency, topic modeling (Latent Dirichlet Allocation [LDA]), and concept network mapping, were used. Findings Participants evaluated digital technologies not only in terms of functional benefits but also through values such as privacy, emotional security and social interaction. While fall sensors and emergency alert systems received high acceptance, camera-based surveillance applications were largely rejected due to concerns about privacy violations and emotional discomfort. Acceptance levels varied according to participants’ demographic characteristics such as age, gender and educational background. The findings underscore the need to develop context-sensitive and ethically grounded digital care policies informed by users’ lived experiences. Research limitations/implications This study is limited to institutional elder care settings, excluding home-based and rural contexts, which may affect generalizability. The use of qualitative methods and a cross-sectional design limits temporal insights and empirical testing of acceptance models. NLP techniques were applied to a small dataset, restricting automation potential. Moreover, the focus on individual perspectives overlooks institutional, legal and market dynamics. Practical implications Digital care technologies should balance support and autonomy by prioritizing privacy, ease of use and emotional connection. Nonintrusive tools like fall sensors are more acceptable, while preserving human interaction, is essential to prevent isolation. Age-sensitive literacy and consent-based models are key to adoption. Social implications Digital care systems affect not only functionality but also emotional well-being, privacy and social connection. Older adults favor supportive technologies like fall detectors, while rejecting surveillance tools due to ethical concerns. These findings stress the need for human-centered, trust-based designs and inclusive policies that align with users’ social and emotional realities. Originality/value This study presents a comprehensive analysis that integrates individual experiences of digital aging with spatial and emotional contexts, enriched through NLP techniques. By developing policy recommendations based on the preferences of older adults in institutional care settings, it addresses a critical gap in the literature and proposes culturally adaptable, rights-based strategies for digital care in Turkiye.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.