Abstract

As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.

Highlights

  • Data mining [2][3] is the extraction of useful knowledge from large amount of data

  • Classification[11] is one of the important aspect which comes under data mining and is a predictive modeling technique

  • Classification techniques [3] are used in various real world problems with respect to application domain as well as for various research purposes relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users

Read more

Summary

Introduction

Data mining tools can provide solution to the business problems that were to too time consuming when done manually. Classification[11] is one of the important aspect which comes under data mining and is a predictive modeling technique. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. Classification techniques [3] are used in various real world problems with respect to application domain as well as for various research purposes relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.