Abstract

Text categorization involves assigning predefined category labels to an unlabeled document. With the exponential growth in the accessibility and availability of digital documents over the past decade, this field significantly attracted the scientific community that immensely demands rapid and accurate categorization of these documents. Relying on experts for manual classification is time-consuming and resource-intensive. Consequently, labeling unlabeled digital documents faster more accurately, and more efficiently is inescapable. One promising approach to addressing this demand is the use of machine learning algorithms. Training these algorithms on a large dataset of labeled texts lets them learn patterns and predicted unlabeled documents. This strategy might greatly expedite the categorizing process while retaining a substantial level of accuracy through leveraging artificial intelligence. These algorithms have also enhanced natural language processing techniques, making them more accurate at classifying unlabeled digital documents. In this study, we propose a novel machine-learning computational framework to address this challenge. Our framework incorporates a novel Bangla stemmer, which reduces words to their stems. We then employed TF-IDF for document vectorization, a statistical measure assessing word relevance for categorization purposes. Experimental results reveal that our framework significantly enhances prediction performance, achieving an impressive 95.3% prediction accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.