Named-Entity Recognition Using Automatic Construction of Training Data From Social Media Messaging Apps

Seungwook Lee,Youngjoong Ko

doi:10.1109/access.2020.3043261

Seungwook Lee, Youngjoong Ko

Open Access

https://doi.org/10.1109/access.2020.3043261

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 20	License type: CC BY 4.0

Affiliation: Dong-A University, Sungkyunkwan University

Abstract

In recent years, social media messaging app data has served as a precious resource to extract useful information, such as critical clues and evidence in legal trials and criminal investigations. Although these data can be of various types, they are mostly in the form of natural language text. Therefore, to extract information from them efficiently, it is essential to research practical natural language processing approaches. This study proposes applying a deep-learning-based named-entity recognition (NER) system as a natural language processing approach for information extraction to these messaging data. In addition, a system for automatically constructing NER training data is presented using the distant supervision method for the training data of deep-learning models. Because social media messaging app data generally include a significant amount of noise, such as typographical and word-spacing errors, a NER system with robustness against these types of noisy data is required to extract information from the messaging data effectively. The results demonstrate that the proposed approach outperforms that of a NER system with manually labeled training data.

Highlights

With the recent popularization of smartphones and social network service (SNS) applications, private interpersonal communication has become easier through social media messaging (SMM)
Precision measures the quality of predictions, and it is represented as the ratio of the number of predicted named entities (NEs) that are correct answers to the number of NEs predicted by the proposed named-entity recognition (NER) system
When syllable embedding with CNN and POS features are added to the baseline, the proposed method improved by 0.82%p on the large automatically labeled data (67,200 messages) generated by distance supervision

Summary

INTRODUCTION

With the recent popularization of smartphones and social network service (SNS) applications, private interpersonal communication has become easier through social media messaging (SMM). This approach results in wasted resources, in terms of time and cost To solve this issue, the distant supervision method, which is a semi-supervised learning method, was used in this study to construct training data automatically, resulting in automatically labeled data for deep-learning-based NER using the SMM app data. We achieved improved performance when the post-training method with automatically labeled data was applied, and fine-tuning was conducted on BERT-based NER. The BiLSTM-CRF-based NER system trained by large automatically labeled data showed 14.14%p improvement compared to the small sample. This result proves that significant improvement can be achieved by applying the distance supervision technique and using largesized unlabeled data.

Related Work

PROPOSED METHOD

EXPERIMENTS

EXPERIMENTAL RESULTS

50 Small Auto

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Named-Entity Recognition Using Automatic Construction of Training Data From Social Media Messaging Apps

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-supervised Statistical Method
Kono Kim ... Harksoo Kim
-
Kono Kim, et. al.Kono Kim ... Harksoo Kim
22 May 2007
22 May 2007

Hindi named entity recognition using system combination
Kamal Sarkar
International Journal of Applied Pattern Recognition | VOL. 5
Kamal SarkarKamal Sarkar
01 Jan 2018
International Journal of Applied Pattern Recognition | VOL. 5

Extending hybrid Conditional Random Fields approach of Named Entity Recognition for Marathi tweets
Maithilee L Patawar ... M A Potey
-
Maithilee L Patawar, et. al.Maithilee L Patawar ... M A Potey
01 Aug 2016
01 Aug 2016

Exploring re-ranking approaches for joint named-entityrecognition and linking
Avirup Sil
-
Avirup SilAvirup Sil
02 Nov 2013
02 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Named-Entity Recognition Using Automatic Construction of Training Data From Social Media Messaging Apps

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access