A SURVEY ON TEXT PRE-PROCESSING AND FEATURE EXTRACTION TECHNIQUES FOR SENTIMENT ANALYSIS OF TWITTER DATA

Anupam Sharma,Mahesh Parmar

doi:10.26562/irjcs.2021.v0812.001

Abstract

Natural Language Processing (NLP) is a popular branch of Artificial Intelligence (AI) that is concerned with how machines comprehend and interpret human language. A computer can only comprehend information in the form of numbers, not words. As a result, it's critical to research what preprocessing and feature extraction techniques are required to implement on a human language so that when it's transformed to machine-readable form, the computer can understand it. Text classification plays a vital role in NLP jobs, with applications in web search, document categorization, chatbots, virtual assistants, and other areas. Sentences and writings that lack structure are naturally difficult to convert to a machine-readable format. Pre-processing techniques have been given special attention since they serve as a precursor to the following stages of information retrieval strategies. Any errors in the early preprocessing approaches are passed on to the NLP pipeline's later stages. In addition, the order in which techniques like tokenization, StopWords elimination, and lemmatization are used is carefully considered. Information retrieval systems are particularly concerned with how efficiently textual input is cleaned and filtered to remove noisy material that does not add to efficiency and, moreover, leads to incorrect results. This survey work emphasizes the importance of effective text preprocessing approaches, as well as information retrieval using Natural Language Processing feature extraction techniques.

Full Text