Classification of Text Documents Based on a Probabilistic Topic Model

S N Karpovich,N N Teslya,A V Smirnov

doi:10.3103/s0147688219050034

Abstract

An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification of Text Documents Based on a Probabilistic Topic Model

Abstract

Talk to us

Similar Papers

More From: Scientific and Technical Information Processing

Lead the way for us

Journal: Scientific and Technical Information Processing	Publication Date: Dec 1, 2019
Citations: 1

Similar Papers

A new text classification technique using small training sets
Fabio Clarizia ... Luca Greco
-
Fabio Clarizia, et. al.Fabio Clarizia ... Luca Greco
01 Nov 2011
01 Nov 2011

Conditional topical coding
Jun Zhu ... Eric P Xing
-
Jun Zhu, et. al.Jun Zhu ... Eric P Xing
21 Aug 2011
21 Aug 2011

The Strategy of Discriminating False Comments on the Internet by Fusing Probabilistic Topic and Word Vector Models
Fei Long
The International Arab Journal of Information Technology | VOL. 21
Fei LongFei Long
01 Jan 2024
The International Arab Journal of Information Technology | VOL. 21

Ensemble based classification using small training sets : A novel approach
C V Krishna Veni ... T Sobha Rani
-
C V Krishna Veni, et. al.C V Krishna Veni ... T Sobha Rani
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of Text Documents Based on a Probabilistic Topic Model

Abstract

Talk to us

Similar Papers

More From: Scientific and Technical Information Processing