Short Text Classification Using Contextual Analysis

Sami Al Sulaimani,Andrew Starkey

doi:10.1109/access.2021.3125768

Abstract

Micro blogging tools provide a real time service for the public to express opinions, to broadcast news and information and offer an opportunity to comment and respond to such output. Word usage in social media is continually evolving. Micro bloggers may use different sets of words to describe a specific event and they may use new words (i.e. neither exist in the training dataset nor in informal or formal dictionaries) or use words in new contexts. Dynamically capturing new words and their potential meaning from their context can help to reflect the words relationship in social media, which then can be useful for solving various problems, like the event classification task. Different approaches have been proposed in this regard, one of them is Contextual Analysis. This paper focuses on examining the potential of this approach for grouping short texts (tweets) talking about the same event into the same category. A new transparent method for text multi-class categorization is presented. It uses the Contextual Analysis approach to capture the most important words in the context of an event and to detect the usage of similar words in different contexts. In order to test the efficacy in these areas, this study evaluates the performance of the proposed method and other well known methods, such as Naïve Bayes, Support Vector Machines, K-Nearest Neighbors and Convolutional Neural Networks. On average, the experiments’ results show that the proposed multi-class classification method can effectively categorize tweets into various event groups, with a high f1-measure score f1>97.09% and f1>95.27%, in the imbalanced classes and high number of classes experiments, respectively. However, similar to the baseline methods, the performance is negatively influenced by the imbalanced dataset. The Convolutional Neural Networks method produces the best performance among the other algorithms with f1>97.74% in all experiments, which is 1.73% and 2.72% higher than the lowest performance of Naive Bayes and K-Nearest Neighbors, respectively, but does not meet the requirements of transparency of results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 8	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Short Text Classification Using Contextual Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Transforming crocodile traceability: Deep metric learning for identifying Siamese crocodiles
Kriengsak Treeprapin ... Suchin Trirongjitmoah
Ecological Informatics | VOL. 82
Kriengsak Treeprapin, et. al.Kriengsak Treeprapin ... Suchin Trirongjitmoah
14 Aug 2024
Ecological Informatics | VOL. 82

Cross-Corpus Training with CNN to Classify Imbalanced Biomedical Relation Data
S S Deepika ... T V Geetha
-
S S Deepika, et. al.S S Deepika ... T V Geetha
01 Jan 2019
01 Jan 2019

Brain tumor categorization from imbalanced MRI dataset using weighted loss and deep feature fusion
S Deepak ... P.M Ameer
Neurocomputing | VOL. 520
S Deepak, et. al.S Deepak ... P.M Ameer
28 Nov 2022
Neurocomputing | VOL. 520

Performance Analysis of Machine Learning Methods
Dinghai Liang ... Yuchen Yuan
Journal of Physics: Conference Series | VOL. 2428
Dinghai Liang, et. al.Dinghai Liang ... Yuchen Yuan
01 Feb 2023
Journal of Physics: Conference Series | VOL. 2428

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Short Text Classification Using Contextual Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Access