Abstract

Named entity recognition (NER) is a key component of the core task of natural language processing (NLP). In order to represent language, neural networks have been used starting in the 2000s, which enhanced entity recognition outcomes. The Setswana language, in contrast, has never been used with neural networks, in particular convolutional neural networks (CNN). Recently, problems with NLP have been addressed using CNNs, and the results have been quite interesting. CNNs are frequently used in NLP due to their ease of training and reputation as the best in sequence labelling. They depict the interdependence of all conceivable word combinations. Given the difficulties in identifying named entities for South African languages, including Setswana, and the inadequacy of resources, this research proposes the use of CNN model to identify named entities for Setswana. The results obtained are benchmarked with traditional methods such as Conditional random fields (CRF). The performance metrics such as F1-Score are explored in establishing the magnitude of trust and reliability of the proposed model. The model is evaluated using data from the South African Centre for Digital Language Resources' Setswana NER dataset. Compared to the present CRF model, which had an F-score performance of 78.0%, the testing results demonstrate that the model performs 94.0% better.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.