Abstract

Recognizing infrequent or emerging named entities in a user-generated text is a challenging task, especially when informal or slang text is used. Some recent works propose to use a gazetteer to solve this problem, but this solution is not general because the gazetteer is task-specific and its maintenance is costly. In this paper, we overcome this drawback by presenting Local Distance Neighbor (LDN), a novel feature that substitutes the gazetteer and makes that the model obtains state-of-the-art results. LDN captures an initial guess for each input token based on the categories of its neighboring tokens within an embedding space. We evaluated the proposed network on the W-NUT-2017 dataset, and we obtained the state-of-the-art F1 score for the Group, Person, and Product categories. We employed our new feature together with the model proposed by Aguilar et al. to recognize named entities in the Tor Darknet related to suspicious activities associated with weapons and drug selling. After increasing the samples of the W-NUT-2017 dataset with 851 manually annotated entries, we repeated our evaluation in this extended version of the dataset, achieving entity and surface F1 scores of 52.96% and 50.57%, respectively. Furthermore, we demonstrate that our proposal can be useful for Law Enforcement Agencies in mining the textual information in the Tor hidden services, being especially adequate for the Group, Person, and Product categories.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.