Abstract
AbstractDifferent text classification tasks have specific task features and the performance of text classification algorithm is highly affected by these task-specific features. It is crucial for text classification algorithms to extract task-specific features and thus improve the performance of text classification in different text classification tasks. The existing text classification algorithms use the attention-based neural network models to capture contextualized semantic features while ignores the task-specific features. In this paper, a text classification algorithm based on label-improved attention mechanism is proposed by integrating both contextualized semantic and task-specific features. Through label embedding to learn both word vector and modified-TF-IDF matrix, the task-specific features can be extracted and then attention weights are assigned to different words according to the extracted features, so as to improve the effectiveness of the attention-based neural network models on text classification. Experiments are carried on three text classification task data sets to verify the performance of the proposed method, including a six-category question classification data set, a two-category user comment data set, and a five-category sentiment data set. Results show that the proposed method has an average increase of 3.02% and 5.85% in F1 value compared with the existing LSTMAtt and SelfAtt models.
Highlights
IntroductionThanks to the rapid development of Internet technology, text information on the Internet are explosively increased and has raised higher requirement for data
The existing text classification research can be divided into two categories: one is traditional text classification methods based on machine learning and the other is deep learning classification methods based on neural network models
The results indicate that our model can improve the attention mechanism through label embedding, obtain text representation closer to the text classification goal, and improve the effect of text classification
Summary
Thanks to the rapid development of Internet technology, text information on the Internet are explosively increased and has raised higher requirement for data. Traditional text classification mainly uses machine learning algorithms, such as logistic regression [7], decision tree [4], support vector machine [5] and naive Bayes [8] These algorithms have achieved good performance on text classification tasks, but they require manual text features extraction, which is cumbersome and costly. The neural network models support automatic feature extraction It exhibits better potential in multi-task text classification than the traditional machine learning based models. It has been proved that the attention-based text classification algorithms (e.g., LSTMAtt [22], SelfAtt [9]) have achieved better performance than BiGRU and LSTM on text classification tasks These attention-based text classification algorithms assign weights to different words mainly according to the text semantics, without considering the impact of task-specific features.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have