Abstract

Named entity recognition is a fundamental task of Natural Language Processing and belongs to the category of sequence labeling problems. In the text, the named entity is the main carrier of information, which is used to express the main content of the text. Accurately identifying these contents is essential for implementing various natural language processing techniques such as information extraction, information retrieval, machine translation, and question and answer system. Named entities in business documents contain a lot of important information that can bring significant business value to the business. In this paper, we propose the method of combining Bi-directional long-short-term memory network and conditional random field, combining n-gram features and character features, and introducing attention mechanism to identify the tenderee, bidding agent and project number three entities in the bidding documents. Compared with the LSTM, BiLSTM can obtain the context information better and extract more features. The CRF uses the features obtained by BiLSTM to decode and obtain the final labeling result. In the comparative experiment of the collected data sets of 20,000 bidding documents, the BiLSTM-CRF model proposed in this paper can produce better labeling effect than other models and meet our expectations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call