A Natural Language Processing System using CWS Pipeline for Extraction of Linguistic Features

Sandeep Kumar,Arun Solanki

doi:10.1016/j.procs.2023.01.155

Sandeep Kumar, Arun Solanki

Open Access

https://doi.org/10.1016/j.procs.2023.01.155

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2023
Citations: 2	License type: cc-by-nc-nd

Affiliation: Gautam Buddha University

Abstract

Understanding the rules of grammar and linguistic features is essential to understanding the context of a language, which helps to understand that language. Similarly, for Natural Language processing, the linguistic feature allows understanding of the language. This paper introduced how Coreference, Word-sense, and Semantic knowledge (CWS) of linguistic features work. It would improve the Natural Language Understanding (NLU) and Natural Language Processing (NLP) tasks of any NLP model and NLP applications (either existing or new). This paper proposed a CWS pipeline method to enhance the efficiency and performance of NLP applications like text summarization, information retrieval, question-answer, machine reading comprehension, etc. The proposed CWS pipeline model used a pre-trained CoNLL-2012 coreference dataset extracted from the famous Ontonotes-5.0 dataset for the English language. The model implementation is done in Python language. The performance evaluation is done using the standard CoNLL-2012 coreference dataset for the English language. The coreference marked output is evaluated against the manually tagged gold standard dataset. The proposed CWS pipeline model gives 78.98% of the average F1 score on the MUC metric, 1.78% higher than the previous models' top result. CWS pipeline model performs better than existing models.

Full Text