Abstract

Latent dirichlet allocation (LDA) is a representative topic model to extract keywords related to latent topics embedded in a document set. Despite its effectiveness in finding underlying topics in documents, the traditional algorithm of LDA does not have a process to reflect sentimental meanings in text for topic extraction. Focusing on this issue, this study aims to investigate the usability of both LDA and sentiment analysis (SA) algorithms based on the affective level of text. This study defines the affective level of a given set of paragraphs and attempts to analyze the perceived trust of the methodologies in regards to usability. In our experiments, the text of the college scholastic ability test was selected as the set of evaluation paragraphs, and the affective level of the paragraphs was manipulated into three levels (low, medium, and high) as an independent variable. The LDA algorithm was used to extract the keywords of the paragraph, while SA was used to identify the positive or negative mood of the extracted subject word. In addition, the perceived trust score of the algorithm was evaluated by the subjects, and this study verifies whether there is a difference in the score according to the affective levels of the paragraphs. The results show that paragraphs with low affect lead to the high perceived trust of LDA from the participants. However, the perceived trust of SA does not show a statistically significant difference between the affect levels. The findings from this study indicate that LDA is more effective to find topics in text that mainly contains objective information.

Highlights

  • The amount of data processed in technical and social systems has exponentially increased with the advent of the fourth industrial revolution and the era of knowledge information processing.Massive text information and public opinions commonly recorded and shared through various social media services have led to the necessity of new technologies and methodologies to find meaningful information hidden in a large set of available unstructured text data

  • The main result of this study is the difference in response of the dependent variable according to the independent variable after analysis of the difference of the perceived trust score between latent dirichlet allocation (LDA) and sentiment analysis (SA) according to the paragraph affective level

  • The results of LDA perceived trust evaluation according to paragraph affective levels are as follows: The mean score of the paragraph at the low level is 3.40, the middle level is 2.97, and the high level is 3.11

Read more

Summary

Introduction

Massive text information and public opinions commonly recorded and shared through various social media services have led to the necessity of new technologies and methodologies to find meaningful information hidden in a large set of available unstructured text data. Text mining has attained attention as a technique for extracting meaningful information from unstructured or semi-structured text, such as documents, emails, and hypertext markup language (HTML). Topic modeling is one of the popular text mining methods that enable us to extract highly interpretable topics in a document set. The latent dirichlet allocation (LDA) algorithm is a representative topic modeling approach [1] where a set of documents is grouped into latent topics with a distinct Dirichlet distribution and each topic is described as a Dirichlet distribution of occurring terms in the document set. The LDA algorithm has been applied to various domains, such as topic

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.