Trends in IoT applications in smart campuses : A topic modeling approach
This article provides an in-depth examination of the emerging field of smart campuses and IoT by investigating the thematic landscape of existing literature. The study employs Latent Dirichlet Allocation (LDA) topic modeling to analyze the thematic landscape, identifying key research areas and themes. The dataset consists of 507 research articles retrieved from the Scopus database, covering the period from 2014 to 2023. The LDA model was optimized with α = 0.1 and β = 0.01, and the number of topics (K) was determined as 8 based on the c_v coherence score (0.393), ensuring robust topic extraction. This analysis offers a comprehensive overview of the current state of smart campuses and IoT research, revealing diverse research themes such as “Comprehensive Technology Integration and Management,” “Artificial Intelligence and Cloud-Based Systems,” “Educational Technologies and Innovative Applications,” “Low Power Networks,” “Smart Education Environments,” “Safe and Flexible Smart Campus Architecture,” “Energy Efficient Smart Buildings,” and “Sustainable Smart Education and Urban Development.” These themes highlight the complexity and multidimensionality of the research in this field. While the study provides valuable insights, it is important to acknowledge certain limitations, such as potential biases in dataset selection and the interpretative nature of topic modeling. Moreover, the LDA model’s dependence on predefined topic numbers might limit its ability to capture emerging research themes comprehensively. Future research could refine these findings by incorporating additional datasets, employing alternative modeling approaches such as Hierarchical Dirichlet Process (HDP), and integrating hybrid methodologies to enhance thematic robustness. The study’s findings underscore the diversity and richness of research themes, offering valuable insights for researchers and practitioners, and facilitating more targeted and effective future research and innovation.
- Research Article
11
- 10.3390/make5020029
- May 14, 2023
- Machine Learning and Knowledge Extraction
Fields in the social sciences, such as education research, have started to expand the use of computer-based research methods to supplement traditional research approaches. Natural language processing techniques, such as topic modeling, may support qualitative data analysis by providing early categories that researchers may interpret and refine. This study contributes to this body of research and answers the following research questions: (RQ1) What is the relative coverage of the latent Dirichlet allocation (LDA) topic model and human coding in terms of the breadth of the topics/themes extracted from the text collection? (RQ2) What is the relative depth or level of detail among identified topics using LDA topic models and human coding approaches? A dataset of student reflections was qualitatively analyzed using LDA topic modeling and human coding approaches, and the results were compared. The findings suggest that topic models can provide reliable coverage and depth of themes present in a textual collection comparable to human coding but require manual interpretation of topics. The breadth and depth of human coding output is heavily dependent on the expertise of coders and the size of the collection; these factors are better handled in the topic modeling approach.
- Preprint Article
- 10.2196/preprints.69983
- Dec 12, 2024
BACKGROUND With the widespread adoption of the internet and smart devices, chatbots have emerged as significant auxiliary tools for public health activities. Despite the increasing application of chatbots in the medical field, comprehensive assessments of research topics and trends in this area remain relatively scarce. OBJECTIVE This study analyzed the application topics of chatbot technology in the medical field and explored the trends of these topics across different time periods, various journals, and different countries. METHODS In this study, a bibliometric approach was used to systematically search the PubMed, CINAHL, Web of Science and Embase databases for literature on medicine and chatbots between 2004 and 2024. By applying Latent Dirichlet Allocation (LDA) topic modeling, the study identified and analyzed the thematic applications of chatbots in the medical field, and explored the temporal evolution of these topics as well as their distribution characteristics across journals and countries. RESULTS We ultimately identified 3,029 articles for analysis. Utilizing the Latent Dirichlet Allocation (LDA) topic modeling technique, we identified nine core topics from the abstracts: ChatGPT medical quiz accuracy research, digital healthcare support assistants, mental health intervention research, epidemic health conversation application research, cancer patient diagnosis and treatment care, artificial intelligence (AI) healthcare education potential research, natural language processing models, human-computer interaction emotion research, and AI reading assistance systems. This study also found that these topics have shown diverse developmental trajectories over time, reflecting the evolution of research interests. In addition, researchers from different journals and countries have shown significant differences in the topics they focus on. CONCLUSIONS This study analyzed the topic distribution, temporal trends, journal, and country distribution characteristics of chatbots in the medical field. The results revealed popular and less researched topics, as well as emerging directions and trends, providing researchers with a tool for rapid identification. These findings not only provide guidance for researchers in selecting research directions but also offer references for journals and countries in determining research priorities, formulating strategic plans, and promoting international collaborative research.
- Research Article
7
- 10.7717/peerj-cs.1459
- Jul 11, 2023
- PeerJ Computer Science
An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.
- Research Article
9
- 10.3905/jfds.2019.1.011
- Sep 4, 2019
- The Journal of Financial Data Science
The authors explore how topic modeling can be used to automate the categorization of initial coin offerings (ICOs) into different topics (e.g., finance, media, information, professional services, health and social, natural resources) based solely on the content within the whitepapers. This tool has been developed by fitting a latent Dirichlet allocation (LDA) model to the text extracted from the ICO whitepapers. After evaluating the automated categorization of whitepapers using statistical and human judgment methods, it is determined that there is enough evidence to conclude that the LDA model appropriately categorizes the ICO whitepapers. The results from a two-population proportion test show a statistically significant difference between topics in the success of an ICO being funded, indicating that the topics are usefully differentiated and suggesting that the topic model could be used to help predict whether an ICO will be successful. TOPICS:Statistical methods, simulations, big data/machine learning Key Findings • Categorization of ICO whitepapers can be done via topic modeling with the latent Dirichlet allocation (LDA) model. • Statistical and human judgment methods confirms that there is enough evidence to conclude that the LDA model appropriately categorizes ICO whitepapers. • Statistical tests suggests that the categorization results from the LDA model provides useful information on predicting whether an ICO will be successful funded.
- Research Article
1
- 10.34123/icdsos.v2021i1.52
- Jan 4, 2022
- Proceedings of The International Conference on Data Science and Official Statistics
Knowledge management is an important activity in improving the performance an organization. BPS Statistics Indonesia has recently implemented such a system to improve the quality and efficiency of business processes. The purposes of this research are: 1) implementing topic modelling on BPS Knowledge Management System to identify groups of document topics; 2) providing recommendations on which the best topic modelling; 3) building a web service function of topic modelling for BPS that includes data preprocessing function and topic group recommendation function. This study applies the Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) topic modelling methods to determine the best grouping techniques for knowledge management systems in BPS Statistics Indonesia. The results show that the LDA model using Mallet is the best model with 25 topic groups and a coherence score of 0.4803. The performance result suggest that the best modelling method is the LDA. The LDA model is then successfully implemented in RESTful web service to provide services in the preprocessing function and topic recommendations on documents entered into the Knowledge Management System BPS.
- Research Article
21
- 10.1016/j.pmcj.2014.07.001
- Jul 10, 2014
- Pervasive and Mobile Computing
Behavior analysis of elderly using topic models
- Conference Article
5
- 10.1109/nkcon56289.2022.10127059
- Nov 20, 2022
In the digital world, the research papers are growing exponentially with time, and it is essential to cluster the documents under their respective categories for easier identification and access. However, researchers find it relatively challenging to recognize and categorize their favorite research articles. Though this task can be achieved by putting in the human work, it would be tedious and exhaustively time-consuming. Henceforth, much research has been done in the field of topic modelling to yield accurate results with a good computation time. The main objective of this paper is to compare the two distinct yet vastly used topic modelling approaches for research paper classification, which can further group the research papers into their respective classes. The two chosen topic modeling methodologies are Non-Negative Matrix Factorization (NMF) and Latent Dirichlet allocation (LDA). This paper introduces a comparison between LDA model's performance with a relatively efficient generative model (NMF) and analyzes its performance on the dataset that consists of 1740 papers extracted from the NYC university website. In comparison, the average coherence score for the LDA method was 0.5282, with its optimal choice of topics being 22, which was slightly higher than the NMF model as it yielded a coherence score of 0.4937 with its optimal topics being 9. To enhance the categorization of LDA, clustering the optimal topics of LDA from 22 to 10 using pyLDAvis has been done. On closely comparing both the models, LDA performs slightly better than NMF with a higher confidence score.
- Research Article
1
- 10.1007/s00520-024-08513-3
- Apr 29, 2024
- Supportive care in cancer : official journal of the Multinational Association of Supportive Care in Cancer
This study aimed to assess the different needs of patients with breast cancer and their families in online health communities at different treatment phases using a Latent Dirichlet Allocation (LDA) model. Using Python, breast cancer-related posts were collected from two online health communities: patient-to-patient and patient-to-doctor. After data cleaning, eligible posts were categorized based on the treatment phase. Subsequently, an LDA model identifying the distinct need-related topics for each phase of treatment, including data preprocessing and LDA topic modeling, was established. Additionally, the demographic and interactive features of the posts were manually analyzed. We collected 84,043 posts, of which 9504 posts were included after data cleaning. Early diagnosis and rehabilitation treatment phases had the highest and lowest number of posts, respectively. LDA identified 11 topics: three in the initial diagnosis phase and two in each of the remaining treatment phases. The topics included disease outcomes, diagnosis analysis, treatment information, and emotional support in the initial diagnosis phase; surgical options and outcomes, postoperative care, and treatment planning in the perioperative treatment phase; treatment options and costs, side effects management, and disease prognosis assessment in the non-operative treatment phase; diagnosis and treatment options, disease prognosis, and emotional support in the relapse and metastasis treatment phase; and follow-up and recurrence concerns, physical symptoms, and lifestyle adjustments in the rehabilitation treatment phase. The needs of patients with breast cancer and their families differ across various phases of cancer therapy. Therefore, specific information or emotional assistance should be tailored to each phase of treatment based on the unique needs of patients and their families.
- Book Chapter
- 10.1007/978-981-10-2338-5_49
- Jan 1, 2016
LDA (Latent Dirichlet Allocation) model is a kind of unsupervised learning model which can extract the hidden topic from text in recent years. In this paper, we proposed a novel LDA model based on the traditional LDA model, which is integrated into the information of text category (Activity-topic LDA). In this paper, the Activity-topic LDA is proposed to improve the original latent Dirichlet allocation (LDA) model. On the basis of the LDA, the proposed method adds the tourism activity information, and obtains the probability distribution model of the tourism activities. Based on this model, we can identify and discover the theme of tourism activities.
- Conference Article
7
- 10.1109/icassp.2018.8462003
- Apr 1, 2018
Latent Dirichlet allocation (LDA) is a statistical model that is often used to discover topics or themes in a large collection of documents. In the LDA model, topics are modeled as discrete distributions over a finite vocabulary of words. The LDA is also a popular choice to model other datasets spanning a discrete domain, such as population genetics and social networks. However, in order to model data spanning a continuous domain with the LDA, discrete approximations of the data need to be made. These discrete approximations to continuous data can lead to loss of information and may not represent the true structure of the underlying data. We present an augmented version of the LDA topic model, where topics are represented using Gaussian mixture models (GMMs), which are multi-modal distributions spanning a continuous domain. This augmentation of the LDA topic model with Gaussian mixture topics is denoted by the GMM-LDA model. We use Gibbs sampling to infer model parameters. We demonstrate the utility of the GMM-LDA model by applying it to the problem of clustering sleep states in electroencephalography (EEG) data. Results are presented demonstrating superior clustering performance with our GMM-LDA algorithm compared to the standard LDA and other clustering algorithms.
- Research Article
- 10.22441/format.2024.v13.i1.005
- Nov 7, 2024
- Format : Jurnal Ilmiah Teknik Informatika
This research focuses on the process of applying Topic Modeling by comparing the Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) models on news tweet data taken from the Detikcom account. The process begins by crawling data over a one year period, starting from December 9, 2022 to December 9, 2023, resulting in 958 rows of data. Data pre-processing includes steps such as case folding, tokenization, stopwords removal, and stemming. After pre-processing, a bag of words process is carried out to calculate the frequency of word occurrences in each document. The number of word occurrence frequencies is used as a reference in creating LSA and LDA models. Each model has 8 topics, 10 iterations, and 42 random states. Topic production is carried out based on keywords that appear in the modeling results. Evaluation of the two models is carried out by measuring topic coherence or topic coherence using the c_v value. The LSA model shows a coherence value of 0.5, while the LDA model has a coherence value of 0.45. The evaluation results show that in this case, the LSA model has better performance than the LDA model based on the topic coherence value. As a suggestion for further research, researchers are expected to consider the use of other cases for topic modeling and other exploration models in Topic Modeling such as OCTIS. This can expand understanding of the performance of the Topic Modeling algorithm on X news data.
- Research Article
- 10.3724/sp.j.1087.2010.03401
- Jan 7, 2011
- Journal of Computer Applications
Concerning the Web document annotation techniques available have weakness in integrity annotation,Latent Dirichlet Allocation(LDA) model was applied to semantic annotation.By embedding document domain information to LDA model,a new LDA model called domain-enabled LDA was introduced.An association between the statistical topical model and domain ontology was established,so the implied topic generated could be interpreted by concepts and an explicit semantic in document was acquired.Because the LDA model assigned a topic to each word in document,a multi-granularity annotation strategy was proposed.The experiments on 20news-group and WebKB show that the domain-enabled LDA model proposed can improve the annotation effectiveness and the multi-granularity annotation method helps different types of query in information retrieval.
- Research Article
35
- 10.1109/tcyb.2014.2353577
- Oct 1, 2014
- IEEE Transactions on Cybernetics
Social TV is a social media service via TV and social networks through which TV users exchange their experiences about TV programs that they are viewing. For social TV service, two technical aspects are envisioned: grouping of similar TV users to create social TV communities and recommending TV programs based on group and personal interests for personalizing TV. In this paper, we propose a unified topic model based on grouping of similar TV users and recommending TV programs as a social TV service. The proposed unified topic model employs two latent Dirichlet allocation (LDA) models. One is a topic model of TV users, and the other is a topic model of the description words for viewed TV programs. The two LDA models are then integrated via a topic proportion parameter for TV programs, which enforces the grouping of similar TV users and associated description words for watched TV programs at the same time in a unified topic modeling framework. The unified model identifies the semantic relation between TV user groups and TV program description word groups so that more meaningful TV program recommendations can be made. The unified topic model also overcomes an item ramp-up problem such that new TV programs can be reliably recommended to TV users. Furthermore, from the topic model of TV users, TV users with similar tastes can be grouped as topics, which can then be recommended as social TV communities. To verify our proposed method of unified topic-modeling-based TV user grouping and TV program recommendation for social TV services, in our experiments, we used real TV viewing history data and electronic program guide data from a seven-month period collected by a TV poll agency. The experimental results show that the proposed unified topic model yields an average 81.4% precision for 50 topics in TV program recommendation and its performance is an average of 6.5% higher than that of the topic model of TV users only. For TV user prediction with new TV programs, the average prediction precision was 79.6%. Also, we showed the superiority of our proposed model in terms of both topic modeling performance and recommendation performance compared to two related topic models such as polylingual topic model and bilingual topic model.
- Conference Article
15
- 10.1109/ihmsc.2014.130
- Aug 1, 2014
Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic-importance of the corpus together to calculate the posterior probability of the sentence, and then, based on the posterior probability, it selects sentences to form a summary. Topic-distribution of each sentence represents the likelihood of sentence belonging to each topic and topic-importance represents the degree that the topics cover the significant portion of the corpus. The method highlights the latent topics and optimizes the summarization. Experiment results on the dataset DUC2006 show the advantage of the multi-document summarization algorithm proposed in the paper. ROUGE values are improved compared with those methods, such as LexRank, LDA-SIBS, LDA-PGS.
- Conference Article
7
- 10.1109/pic.2016.7949504
- Dec 1, 2016
With the rapid spread of Internet and the mobile web, the number of news pages is increasing quickly as well as the content of news becomes highly dynamic. It's difficult for normal users to obtain specific information contained in a mass of news streams. So it's of great research significance to study how to analyze massive news, detect and track news hotspots automatically. This research proposes to apply LDA (Latent Dirichlet Allocation) model to the application of topic detection and tracking. The news articles collected by crawlers are modeled by the LDA model in a form of document-topic-word distribution. We propose a method to compute the heat of topics based on the distribution and to detect the news hotspots. In addition, we track the evolution of the topic trends in different time-slices. Jenson-Shannon distance is used to measure the similarity between topics to identify topic inheritance and topic mutation. We conducted experiments on a dataset consisting of 3462 news texts from news portals. The result revealed that the proposed model has a good effect both in detecting hotspots and discovering meaningful topical evolution trends.
- Research Article
- 10.47974/cjsim-2024-11007
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2025-03001
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2025-03003
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2024-021
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2024-017
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2025-03004
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2024-11002
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2024-11006
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2023-0107
- Jan 1, 2025
- COLLNET Journal of Scientometrics and Information Management
- Research Article
- 10.47974/cjsim-2022-0085
- Jan 1, 2024
- COLLNET Journal of Scientometrics and Information Management
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.