A survey on text document categorization using enhanced sentence vector space model and bi-gram text representation model based on novel fusion techniques

Abdisa Demissie Amensisa,Seema Patil,Poorva Agrawal

doi:10.1109/icisc.2018.8399067

Abstract

In this today's technology, many of digital documents are being generated and available each day. However, it would cost a vast amount of time and human efforts to classify them in reasonable categories like important and unimportant, spam or no-spam. The text document classification tasks pass under the Automatic Classification (also known as pattern Recognition) problem in Machine Learning and Text Mining. It is necessary to classify large text documents into specific classes, to make clear and search simply. Classified data are easy for users to browse. The importance of common text document placement is the representation of the unknown text for some pre-categories as representations for survival. The Combination of classifiers is fused together to increase the accuracy classification result in a single text document. The contemplate text document classification depend on different representation model and fusion based classifiers are explained in the paper. In order to examine different techniques, Enhanced Sentence Vector Space Model (ES-VSM) and a Bigram is used to match the layout of a problem document. The result completed by assessing different current classifiers by looking accuracy of their performance in advance. This will explain and promote a willingness of new research participants to respond to challenging situations and respond to similar responses.

Full Text