Topic LDA Research Articles

The LDA topic model is a document generation model, which is an unsupervised machine learning technique that can be used in keyword extraction, topic classification, and so on. The main purpose of this paper is to effectively evaluate the number of optimal topics of the LDA topic model in the message subject classification and extract the importance of the topic, so that the LDA topic model can be highly readable after subject classification. Therefore, this paper proposes a keyword matching and subjective statistical value word comparison (KM-SSVW) for the subject classification of emails, the keyword matching technique in this method uses the TF-IDF technique. This method can accurately evaluate the extracted keywords and optimize the number of topics. The data in this article is mainly from the mail gate, a total of 7,000 messages that Hillary communicates with others. The empirical results show that the proposed method of keyword matching and subjective statistical value word comparison has a good effect on subject quantity optimization and subject word readable evaluation. However, there are still some limitations in this paper. In the experiment, new methods are not validated for other types of data sets, such as microblog short text, XML documents, and WeChat public platform articles.

In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity. The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author. The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text. Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language. A large corpus has been used for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification. The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents. Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.

Topic LDA Research Articles

Related Topics

Articles published on Topic LDA

The Effects of the LDA Topic Model on Sentiment Classification

Research on Evaluation Method of LDA Topic Model in Mail Classification

LDA 토픽 모델링을 이용한 블록체인 학술연구 동향 분석 : 미국 · 중국 · 한국을 중심으로

Design and Implementation of a Machine Learning-Based Authorship Identification Model

"뉴스데이터의 LDA 토픽 분석을 통한 장수군 농촌지역 활성화 사업의 특징 - 관광 · 생활 키워드를 중심으로 - "

Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

무한 사전 온라인 LDA 토픽 모델에서 의미적 연관성을 사용한 토픽 확장

LDA topics: Representation and evaluation

An Algorithm of LDA Topic Reduction Based on Rough Set

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Topic LDA Research Articles

Related Topics

Articles published on Topic LDA

The Effects of the LDA Topic Model on Sentiment Classification

Research on Evaluation Method of LDA Topic Model in Mail Classification

LDA 토픽 모델링을 이용한 블록체인 학술연구 동향 분석 : 미국 · 중국 · 한국을 중심으로

Design and Implementation of a Machine Learning-Based Authorship Identification Model

"뉴스데이터의 LDA 토픽 분석을 통한 장수군 농촌지역 활성화 사업의 특징 - 관광 · 생활 키워드를 중심으로 - "

Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

무한 사전 온라인 LDA 토픽 모델에서 의미적 연관성을 사용한 토픽 확장

LDA topics: Representation and evaluation

An Algorithm of LDA Topic Reduction Based on Rough Set