Abstract
As the time goes on and on, digitization of text has been increasing enormously and the need to organize, categorize and classify text has become indispensable. Disorganization and very little categorization and classification of text may result in slower response time of text or information retrieval. Therefore it is very important and essential to organize, categorize and classify texts and digitized documents according to definitions proposed by text mining experts and computer scientists. Work has been done on Text Mining, Text Categorization and Automatic Text Classification by computer and information scientists, but obviously a lot of space for novel research in this domain is available. In this paper we have proposed the mathematical notation and graphical models for Text Mining, Text Categorization and Automatic Text Classification to get in depth understanding of these techniques and concepts. Introduction and proposal of mathematical and graphical models for Text Mining, Text Categorization and Automatic Text Classification will shorten the response time of text and information retrieval. Also the performance of web search engines can be improved so much by employing these mathematical and graphical models.
Highlights
In the last fifteen years, content-based document management system has obtained outstanding status in the field of Computer and Information Systems Engineering and Computer Science
When we talk about real world applications of Text Categorization (TC) in the era from early 60s to late 80s, a lot of work had been done on Knowledge Engineering, which is an approach to Text Categorization (TC).The method adopted in Knowledge Engineering was that if someone wanted to classify documents under given categories, the experts knowledge was being encoded in the form of rules or a set of rules manually
Automatic Text Classification (ATC) The readers should have very clear concept in their minds that there is a difference between Automatic Text Classification (ATC) and Text Categorization (TC).We have proposed the new definitions of Automatic Text Classification (ATC) here which are different from the definitions from the literature
Summary
In the last fifteen years, content-based document management system has obtained outstanding status in the field of Computer and Information Systems Engineering and Computer Science. There are two reasons for this popularity of content-based management system. Consider an example of a room having a lot of things and accessories scattered in different directions. If one wants to search an item in this room he or she has to do a lot of efforts because of disorganization of items and human being’s tendency to be confused by seeing a lot of things gathered together. If all the things are organized and placed on their appropriate locations, search will be easy and fast. If the text is categorized and documents are classified among categories, search and retrieval of text will be fast and efficient
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have