Abstract

As the time goes on and on, digitization of text has been increasing enormously and the need to organize, categorize and classify text has become indispensable. Disorganization and very little categorization and classification of text may result in slower response time of text or information retrieval. Therefore it is very important and essential to organize, categorize and classify texts and digitized documents according to definitions proposed by text mining experts and computer scientists. Work has been done on Text Mining, Text Categorization and Automatic Text Classification by computer and information scientists, but obviously a lot of space for novel research in this domain is available. In this paper we have proposed the mathematical notation and graphical models for Text Mining, Text Categorization and Automatic Text Classification to get in depth understanding of these techniques and concepts. Introduction and proposal of mathematical and graphical models for Text Mining, Text Categorization and Automatic Text Classification will shorten the response time of text and information retrieval. Also the performance of web search engines can be improved so much by employing these mathematical and graphical models.

Highlights

  • In the last fifteen years, content-based document management system has obtained outstanding status in the field of Computer and Information Systems Engineering and Computer Science

  • When we talk about real world applications of Text Categorization (TC) in the era from early 60s to late 80s, a lot of work had been done on Knowledge Engineering, which is an approach to Text Categorization (TC).The method adopted in Knowledge Engineering was that if someone wanted to classify documents under given categories, the experts knowledge was being encoded in the form of rules or a set of rules manually

  • Automatic Text Classification (ATC) The readers should have very clear concept in their minds that there is a difference between Automatic Text Classification (ATC) and Text Categorization (TC).We have proposed the new definitions of Automatic Text Classification (ATC) here which are different from the definitions from the literature

Read more

Summary

INTRODUCTION

In the last fifteen years, content-based document management system has obtained outstanding status in the field of Computer and Information Systems Engineering and Computer Science. There are two reasons for this popularity of content-based management system. Consider an example of a room having a lot of things and accessories scattered in different directions. If one wants to search an item in this room he or she has to do a lot of efforts because of disorganization of items and human being’s tendency to be confused by seeing a lot of things gathered together. If all the things are organized and placed on their appropriate locations, search will be easy and fast. If the text is categorized and documents are classified among categories, search and retrieval of text will be fast and efficient

TEXT CATEGORIZATION
Knowledge Engineering Approach
Machine Learning Paradigm Approach
Advantages of Machine Learning Paradigm Approach
TEXT MINING
Mathematical Notation of Automatic Text Classification
FUTURE RESEARCH WORK
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call