Abstract
In recent years, an unexpected amount of growth of the text documents volume has been observed on the internet, intranet, in digital libraries and newsgroups. To obtain useful information and meaningful patterns from these documents, a great many researchers known under the term “text mining” have been carried out. Among them text categorization is to be mentioned that covers the problem of classifying documents relative to their similarities. One of techniques applied in this area is called centroid-based document classification method. All researchers on text categorization use the notion of frequency somehow or other. In this study, letter frequencies (LF) have been used for text categorization. By making use of letter frequencies information, the centroid-based document classification has been carried out. An experiment has been done on language detection for text documents. Its results allow propose that the letter-based text categorization should be done prior to term based text categorization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.