Categorizing the Document Using Multi Class Classification in Data Mining

Shweta Joshi,Bhawna Nigam

doi:10.1109/cicn.2011.50

Abstract

Classification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of classifiers. But classification cannot be done easily when it is to be applied on text documents that is: document classification. The main purpose of this paper is to analyze the task multi-class document classification and to learn that how can we achieve high classification accuracy in the context of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model: assume all features are independent of one another, and compute the class of a document based on maximal probability. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective then Flat classification. It also performs better in case of multi-label document classification. The dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.

Full Text