Social media data are characterized by significant noise and non-standardization, thereby posing challenges for existing methods in recognizing named entities owing to the entity sparsity and insufficient semantic richness. Thus, to deal with these issues, this study proposes SEMFF-NER, a named entity recognition (NER) method in social media texts that integrates multi-scale features and syntactic information. First, global features are extracted using a Transformer-based encoder (XLNET) with embedded dependency syntactic relations to enhance semantic representation. Next, sliding windows of different lengths capture local features, which are input into a bi-directional long short-term memory (BiLSTM) to capture multi-level local features. Subsequently, the fusion-attention mechanism effectively integrates global contextual information with multiple local features to predict the optimal entity labels. Extensive experiments conducted on three datasets collected from English social media platforms (WNUT2016, WNUT2017, OntoNotes5.0_English) demonstrate the advantageous performance of our proposed method, and ablation experiments further confirm the method's viability and effectiveness.
Read full abstract