Legislative agenda-setting power of social media
Abstract In an effort to understand the legislative agenda setting impacts of social media content, the present study analyzes political discourse on Twitter regarding the George Floyd Justice in Policing Act. Using a Latent Dirichlet Allocation (LDA) topic model to dissect the Twitter conversation aimed at Representative Karen Bass, the sponsor of the H.R.7120, in the weeks leading up to the bill’s filing, our analysis of nearly 68,000 tweets posted in the days before the bill’s filing reveals that constituents strongly urged the Representative to work on legislation targeted on police reform – offering evidence of how constituents demanded, and drove, legislative action. Considering our findings, we argue that there is considerable potential for social media to serve as an amplifier of social issues and concerns among constituents. Through this process, we posit that social media can prove to be a vital catalyst in social justice reform.
- Conference Article
5
- 10.1109/iccst50977.2020.00094
- Oct 1, 2020
Physical bookstore is the leader of cultural trend, the carrier of national reading and the provider of public cultural services, which embodies the cultural soft power of a city. The widely use of Internet e-commerce platform and the change of people's reading habits have brought great impact on physical bookstores, resulting in poor overall profitability of physical bookstores. In order to realize the sustainable development of physical bookstores, we mine and analyze consumer-generated online reviews. In this paper, a method of sentiment analysis based on Hybrid LSTM-CNN (Hybrid Long Short-Term Memory-Convolutional Neural Network) and LDA (Latent Dirichlet Allocation) topic model is proposed. Firstly, the Hybrid LSTM-CNN model is used to classify reviews, and then LDA topic model is used to extract features of positive and negative reviews. The results show that Hybrid LSTM-CNN model has better performance than the classic LSTM and CNN in sentiment classification. The LDA model mines that consumers have the positive attitude towards the products, context and ambiance of physical bookstores, and the negative attitude towards price and service. This method studies consumer-generated online reviews in physical bookstores from two aspects: sentiment classification and topic mining, which can help physical bookstore operators to know consumer feedback in time.
- Research Article
- 10.1177/10690727241289125
- Oct 14, 2024
- Journal of Career Assessment
For more than a century, self-report inventories have been the traditional method for assessing vocational interests. Little research has examined the use of machine learning techniques, such as natural language processing (NLP), in interest assessment. This paper explores the extent to which natural language on social media can be used to predict individuals’ self-ratings on eight basic interests representing the SETPOINT model: Agriculture, Engineering, Human Resources, Life Science, Management/Administration, Mechanics/Electronics, Media, and Social Science. Leveraging closed- (Linguistic Inquiry and Word Count; LIWC) and open-vocabulary NLP approaches (Latent Dirichlet allocation (LDA) topic modeling), we analyzed 3.2 million Facebook posts from 2,834 participants who completed a 32-item basic interest measure. We found that the convergent validities of these NLP approaches for predicting vocational interest scores (LIWC: [Formula: see text] = 0.19; LDA topic modeling: [Formula: see text] = 0.24) are comparable to prior research on language-based personality assessments. Our study also revealed largely face-valid language markers that characterize different basic interests. Implications for developing language-based interest assessments for applied settings (e.g., career guidance and employee selection) and future research directions are discussed.
- Research Article
8
- 10.2196/44356
- Jun 9, 2023
- Journal of Medical Internet Research
Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave-related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics.
- Research Article
12
- 10.1186/s40537-022-00605-3
- Apr 28, 2022
- Journal of Big Data
Big data analytics utilizes different techniques to transform large volumes of big datasets. The analytics techniques utilize various computational methods such as Machine Learning (ML) for converting raw data into valuable insights. The ML assists individuals in performing work activities intelligently, which empowers decision-makers. Since academics and industry practitioners have growing interests in ML, various existing review studies have explored different applications of ML for enhancing knowledge about specific problem domains. However, in most of the cases existing studies suffer from the limitations of employing a holistic, automated approach. While several researchers developed various techniques to automate the systematic literature review process, they also seemed to lack transparency and guidance for future researchers. This research aims to promote the utilization of intelligent literature reviews for researchers by introducing a step-by-step automated framework. We offer an intelligent literature review to obtain in-depth analytical insight of ML applications in the clinical domain to (a) develop the intelligent literature framework using traditional literature and Latent Dirichlet Allocation (LDA) topic modeling, (b) analyze research documents using traditional systematic literature review revealing ML applications, and (c) identify topics from documents using LDA topic modeling. We used a PRISMA framework for the review to harness samples sourced from four major databases (e.g., IEEE, PubMed, Scopus, and Google Scholar) published between 2016 and 2021 (September). The framework comprises two stages—(a) traditional systematic literature review consisting of three stages (planning, conducting, and reporting) and (b) LDA topic modeling that consists of three steps (pre-processing, topic modeling, and post-processing). The intelligent literature review framework transparently and reliably reviewed 305 sample documents.
- Research Article
5
- 10.1057/s41599-024-03066-6
- Apr 30, 2024
- Humanities and Social Sciences Communications
In the digital age, as social media evolves into a new and significant centre for the dissemination of Chinese folk beliefs, the Malaysian Chinese have actively shared information about these folk beliefs on their social media platforms. The dissemination has transcended regional barriers, encouraging more Malaysian Chinese across various states to actively participate in public discussions on this topic. This study delves into Malaysian Chinese folk beliefs by analysing data from Facebook. A comprehensive examination of 4012 text posts was conducted using the latent Dirichlet allocation (LDA) model for topic modelling. The analysis identified four main themes on social media: ‘Practitioners Worship’, ‘Temple Activities’, ‘Deity Legends’, and ‘Merchandise about Deity Statues’. Based on integrating social construction theory and media ecology theory, the study first explores the varied constructors, including practitioners, temple organisations, media organisations, and merchants. Secondly, Malaysian Chinese folk beliefs on social media present characteristics of utilitarianism, regional diversity, multiple social functions, flowing realms, strong Taoist elements, commercialisation, and a close relationship with the Spring Festival. Furthermore, ‘Safety and Peace’, ‘Pray for Demands’, and ‘Merits and Virtues’ form an interconnected semantic nexus. Hence, the findings theoretically highlight the interaction and significance of social media in the construction and practice of folk beliefs within the Malaysian Chinese community. Practically, this research provides valuable insights into the understanding and dissemination of Malaysian Chinese religious culture in the digital era.
- Research Article
- 10.2196/77424
- Aug 21, 2025
- Journal of Medical Internet Research
BackgroundMpox has reemerged as a global public health concern. With the growing reliance on social media for health information dissemination, understanding public perception through these platforms is essential for designing effective health promotion strategies.ObjectiveThis study analyzes TikTok data related to mpox using Latent Dirichlet Allocation (LDA) topic modeling. This paper aims to extract key topics and inform targeted health promotion strategies for mpox prevention and control.MethodsUsing the “Aisou Jisou” system, we collected TikTok data containing the keyword “Mpox” from April 1, 2022, to March 31, 2025. The dataset comprised 25,672 text data and associated search terms. We analyzed trends in the Search Index and Target Group Index (TGI) across time, gender, age groups, and provinces. LDA topic modeling was applied to identify latent topics within the text data, and topic evolution was examined during 4 peak months of the Search Index.ResultsA total of 4 major Search Index peaks were identified on TikTok in China, which are May 2022, July 2023, August 2024, and February 2025. These peaks aligned with key global and national mpox events, including WHO’s declaration of a global mpox outbreak in May 2022 and the detection of the clade Ib Mpox in China in January 2025. TGI analysis revealed that users aged 18‐23 years exhibited the highest engagement. Spatially, Beijing, Tianjin, and Jilin recorded the highest cumulative TGI values (5922.38, 5692.41, and 3579.90, respectively). LDA topic modeling identified 8 primary topics, including transmission and prevention, vaccine concerns, and misinformation, etc. Public attention evolved from general disease knowledge toward issues of stigmatization and vaccine distrust over time. Sankey diagrams illustrated shifts in public attention across topics at different Search Index peaks, with “Mpox Transmission and Prevention” receiving the most attention in May 2022 and “Mpox Vaccination and Infection Prevention” in February 2025.ConclusionsTikTok provides real-time insights into public attention during mpox outbreaks, but can also propagate misinformation and stigmatizing narratives. Public health authorities should leverage these platforms for timely communication, actively address misinformation, and mitigate social bias. Tailored strategies are needed to enhance health literacy, minimize stigma, and strengthen outbreak preparedness and response. This study highlights the dual role of social media as both an information source and a potential vector for misinformation, emphasizing the necessity for active monitoring and regulation by health authorities to ensure the accuracy and reliability of disseminated health information.
- Preprint Article
- 10.2196/preprints.69983
- Dec 12, 2024
BACKGROUND With the widespread adoption of the internet and smart devices, chatbots have emerged as significant auxiliary tools for public health activities. Despite the increasing application of chatbots in the medical field, comprehensive assessments of research topics and trends in this area remain relatively scarce. OBJECTIVE This study analyzed the application topics of chatbot technology in the medical field and explored the trends of these topics across different time periods, various journals, and different countries. METHODS In this study, a bibliometric approach was used to systematically search the PubMed, CINAHL, Web of Science and Embase databases for literature on medicine and chatbots between 2004 and 2024. By applying Latent Dirichlet Allocation (LDA) topic modeling, the study identified and analyzed the thematic applications of chatbots in the medical field, and explored the temporal evolution of these topics as well as their distribution characteristics across journals and countries. RESULTS We ultimately identified 3,029 articles for analysis. Utilizing the Latent Dirichlet Allocation (LDA) topic modeling technique, we identified nine core topics from the abstracts: ChatGPT medical quiz accuracy research, digital healthcare support assistants, mental health intervention research, epidemic health conversation application research, cancer patient diagnosis and treatment care, artificial intelligence (AI) healthcare education potential research, natural language processing models, human-computer interaction emotion research, and AI reading assistance systems. This study also found that these topics have shown diverse developmental trajectories over time, reflecting the evolution of research interests. In addition, researchers from different journals and countries have shown significant differences in the topics they focus on. CONCLUSIONS This study analyzed the topic distribution, temporal trends, journal, and country distribution characteristics of chatbots in the medical field. The results revealed popular and less researched topics, as well as emerging directions and trends, providing researchers with a tool for rapid identification. These findings not only provide guidance for researchers in selecting research directions but also offer references for journals and countries in determining research priorities, formulating strategic plans, and promoting international collaborative research.
- Conference Article
15
- 10.2991/sekeie-14.2014.47
- Jan 1, 2014
LDA (Latent Dirichlet Allocation) topic model has been widely applied to text clustering owing to its efficient dimension reduction. The prevalent method is to model text set through LDA topic model, to make inference by Gibbs sampling, and to calculate text similarity with JS (JensenShannon) distance. However, JS distance cannot distinguish semantic associations among text topics. For this defect, a new text similarity computing algorithm based on hidden topics model and word co-occurrence analysis is introduced. Tests are carried out to verify the clustering effect of this improved computing algorithm. Results show that this method can effectively improve text similarity computing result and text clustering accuracy. Keywords-topic model; LDA (Latent Dirichlet Allocation); JS (Jensen-Shannon) distance; word co-occurrence; similarity
- Conference Article
2
- 10.1109/icac51239.2020.9357264
- Dec 10, 2020
The price fluctuation of vegetables is one of the economic problems faced by every country, including Sri Lanka. Many factors such as environmental conditions as well as supply, demand, social, cultural, and political situations of the country cause the price of vegetables to fluctuate. Nowadays, social media represents public opinion about current events. Twitter has become one of the fastest social media platforms for getting the latest and historical news and it can be used to track historical trends in different fields. In this paper, we applied the Latent Dirichlet Allocation (LDA) topic modeling algorithm to determine the topics of the tweets about Sri Lanka when the prices of vegetables were very high and low. Through a manual analysis of extracted topics, we identified the situation in the country during a selected period and how it has impacted the vegetable prices. According to the results, vegetable prices are on the rise during the festive season in Sri Lanka. It also appears that political factors, such as elections, do not have a major impact on vegetable prices. It seems that vegetable prices have gone up during the unstable or chaotic periods in Sri Lanka.
- Research Article
59
- 10.2478/dim-2020-0023
- Jan 1, 2021
- Data and Information Management
Exploring Public Response to COVID-19 on Weibo with LDA Topic Modeling and Sentiment Analysis
- Research Article
7
- 10.3389/fpsyg.2022.986838
- Dec 28, 2022
- Frontiers in Psychology
Since digital technology has had a significant impact on the fashion industry, digital fashion has become a hot topic in today's society. Currently, research on digital fashion is focused on the transformation of enterprise marketing strategies and the discussion of digital technology. Despite this, the current study does not include an analysis of the audience's emotional and cognitive responses to digital fashion on social networking platforms. A comprehensive analysis and discussion of 52,891 posts about digital fashion and virtual fashion published on social networking sites was conducted using k-means clustering analysis, Latent Dirichlet Allocation (LDA) topic modeling, and sentiment analysis in this study. The study examines the public's perception and hot topics about digital fashion, as well as the industry's development situation and trends. According to the findings, both positive and neutral emotions accompany the public's attitude toward digital fashion. There is a wide range of topics covered in the discussion. Innovations in digital technology have impacted the creation of jobs, talent demand, marketing strategies, profit forms, and industrial chain innovation of fashion-related businesses. Researchers in related fields will find this study useful not only as a reference for research methods and directions, but also as a source of references for research methodology. A case study and data reference will also be provided to industry practitioners.
- Research Article
5
- 10.1186/s13634-021-00761-3
- Jul 20, 2021
- EURASIP Journal on Advances in Signal Processing
Existing software intelligent defect classification approaches do not consider radar characters and prior statistics information. Thus, when applying these appaoraches into radar software testing and validation, the precision rate and recall rate of defect classification are poor and have effect on the reuse effectiveness of software defects. To solve this problem, a new intelligent defect classification approach based on the latent Dirichlet allocation (LDA) topic model is proposed for radar software in this paper. The proposed approach includes the defect text segmentation algorithm based on the dictionary of radar domain, the modified LDA model combining radar software requirement, and the top acquisition and classification approach of radar software defect based on the modified LDA model. The proposed approach is applied on the typical radar software defects to validate the effectiveness and applicability. The application results illustrate that the prediction precison rate and recall rate of the poposed approach are improved up to 15 ~ 20% compared with the other defect classification approaches. Thus, the proposed approach can be applied in the segmentation and classification of radar software defects effectively to improve the identifying adequacy of the defects in radar software.
- Research Article
1
- 10.2139/ssrn.3708327
- Jan 1, 2020
- SSRN Electronic Journal
Research publications related to the novel coronavirus disease COVID-19 are rapidly growing in number. However, current online literature hubs, even with artificial intelligence, are inadequate for identifying the relative strength of research topics. Hence, we aimed to develop a comprehensive Latent Dirichlet Allocation (LDA) topic model using natural language processing (NLP) techniques, provide visualisations for temporal trends, and apply our methodology to improve existing online literature hubs.Using the search term “COVID”, abstracts were extracted from PubMed®, from January to July 2020 (N=16346). An LDA topic model was trained on 81% of abstracts. Weekly temporal trends were visualised as a heatmap on all abstracts. Then, we tested our methodology on over 23,000 abstracts gathered from January 2020 to September 2020 from LitCovid, a literature hub from the National Center for Biotechnology Information. We use our topic model to subdivide LitCovid’s eight categories into corresponding LDA topics.The optimised LDA topic model, created using PubMed® data, produced 25 comprehensive topics with no significant overlap. There were temporal changes for topics: prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. We identified inadequate representation of “Airborne Transmission Protection”. Importantly, research on masks and PPE is skewed towards clinical applications with a lack of population-based epidemiological research. Our methodology, when applied to LitCovid, identified important topics within each LitCovid category. For example, “Case Report” was split into topics such as “Pulmonary” and “Oncology” as well as the under-represented topics “Haematology” and “Gastroenterology”. Our work allows for comprehensive topic identification and intuitive visualisation of temporal trends in COVID-19 research. Implementation of the methodology complements existing online literature hubs and identifies underrepresented topics such as population-based studies on masks that may be of significant public interest.Funding Statement: None to declare.Declaration of Interests: There are no conflicts of interest.
- Research Article
7
- 10.4258/hir.2021.27.3.200
- Jul 1, 2021
- Healthcare Informatics Research
ObjectivesThe main aim of this study was to use text mining on social media to analyze information and gain insight into the health-related concerns of thalassemia patients, thalassemia carriers, and their caregivers.MethodsPosts from two Facebook groups whose members consisted of thalassemia patients, thalassemia carriers, and caregivers in Malaysia were extracted using the Data Miner tool. In this study, a new framework known as Malay-English social media text pre-processing was proposed for performing the steps of pre-processing the noisy mixed language (Malay-English language) of social media posts. Topic modeling was used to identify hidden topics within posts shared among members. Three different topic models—latent Dirichlet allocation (LDA) in GenSim, LDA in MALLET, and latent semantic analysis—were applied to the dataset with and without stemming using Python.ResultsLDA in MALLET without stemming was found to be the best topic model for this dataset. Eight topics were identified within the posts shared by members. Of those eight topics, four were newly discovered by this study, and four others corresponded to the findings of previous studies that used an interview approach.ConclsionsTopic 2 (the challenges faced by thalassemia patients) was found to be the topic with the highest attention and engagement. Healthcare practitioners and other concerned parties should make an effort to build a stronger support system related to this issue for those affected by thalassemia.
- Research Article
6
- 10.2196/29011
- Dec 7, 2021
- JMIR Infodemiology
BackgroundIn 2018, JUUL Labs Inc, a popular e-cigarette manufacturer, announced it would substantially limit its social media presence in compliance with the Food and Drug Administration’s (FDA) call to curb underage e-cigarette use. However, shortly after the announcement, a series of JUUL-related hashtags emerged on various social media platforms, calling the effectiveness of the FDA’s regulations into question.ObjectiveThe purpose of this study is to determine whether hashtags remain a common venue to market age-restricted products on social media.MethodsWe used Twitter’s standard application programming interface to download the 3200 most-recent tweets originating from JUUL Labs Inc’s official Twitter Account (@JUULVapor), and a series of tweets (n=28,989) from other Twitter users containing either #JUUL or mentioned JUUL in the tweet text. We ran exploratory (10×10) and iterative Latent Dirichlet Allocation (LDA) topic models to compare @JUULVapor’s content versus our hashtag corpus. We qualitatively deliberated topic meanings and substantiated our interpretations with tweets from either corpus.ResultsThe topic models generated for @JUULVapor’s timeline seemingly alluded to compliance with the FDA’s call to prohibit marketing of age-restricted products on social media. However, the topic models generated for the hashtag corpus of tweets from other Twitter users contained several references to flavors, vaping paraphernalia, and illicit drugs, which may be appealing to younger audiences.ConclusionsOur findings underscore the complicated nature of social media regulation. Although JUUL Labs Inc seemingly complied with the FDA to limit its social media presence, JUUL and other e-cigarette manufacturers are still discussed openly in social media spaces. Much discourse about JUUL and e-cigarettes is spread via hashtags, which allow messages to reach a wide audience quickly. This suggests that social media regulations on manufacturers cannot prevent e-cigarette users, influencers, or marketers from spreading information about e-cigarette attributes that appeal to the youth, such as flavors. Stricter protocols are needed to regulate discourse about age-restricted products on social media.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.