Abstract

Text classification has become an emerging topic in this modern era as it allow us to extract meaningful information from the data and improve the performance of business and organization. Often termed as Text Tagging or Categorization, these textual data can be Structured, Semi-Structured and Unstructured. This work has utilized unstructured data with the help of twitter API. These unstructured data are then structured using NLP cloud API as the process of manual sorting is time consuming and tedious. The structured textual data comprises of a set of categorical data that is labelled on the basis of the content of the comments. Text Classification has various use cases such as Sentiment analysis, Polarity Checking, Natural Language Inference and accessing grammatical correctness. Earlier experimental work has been carried out using Naive Bayes with a Bag of Words (BOW) feature extraction technique by previous researchers. The objective of this work is to analyze the transformed structured imbalanced data and study the impact it has on the accuracy of Naive Bayes model using Term Frequency-Inverse Document Frequency (TF-IDF) technique. Naive Bayes is a linear, probabilistic and supervised machine learning classifier based on Bayesian theorem. On training and testing the data using the proposed model, it is found that there is an improvement in the overall accuracy with 2-3%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.