Abstract

Named entity recognition (NER) is a sequential labelling task in categorizing textual nuggets into specific types. Named entity boundary detection can be recognized as a prominent research area under the NER domain which has been heavily adapted for information extraction, event extraction, information retrieval, sentiment analysis etc. Named entities (NE) can be identified as per flat NEs and nested NEs in nature and limited research attempts have been made for nested NE boundary detection. NER in low resource settings has been identified as a current trend. This research work has been scoped down to unveil the uniqueness in NE boundary detection based on Sinhala related contents which have been extracted from social media. The prime objective of this research attempt is to enhance the approach of named entity boundary detection. Considering the low resource settings, as the initial step, the linguistic patterns, complexity matrices and structures of the extracted social media statements have been analyzed further. A dedicated corpus of more than 100,000 tuples of Sinhala related social media content has been annotated by an expert panel. As per the scientific novelties, NE head word detection loss function, which was introduced in HelaNER 1.0, has been further improved and the NE boundary detection has been further enhanced through tuning up the stack pointer networks. Additionally, NE linking has been improved as a by-product of the previously mentioned enhancements. Various experimentations have been conducted, evaluated and the outcome has revealed that our enhancements have achieved the state-of-art performance over the existing baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.