Information-Theoretic method for classification of texts

B Ya Ryabko,A E Gus’Kov,I V Selivanova

doi:10.1134/s0032946017030115

Information-Theoretic method for classification of texts

B Ya Ryabko, A E Gus’Kov + Show 1 more

https://doi.org/10.1134/s0032946017030115

Copy DOI

Journal: Problems of Information Transmission	Publication Date: May 13, 2017
Citations: 7

Affiliation: Siberian Branch of the Russian Academy of Sciences

#Classification Error #Method For Automatic Classification + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.

Full Text