Abstract
With the rapid development of network media, the short text has become the main cover of information dissemination by quickly disseminating relevant entity information. However, the lack of context in the short text can easily lead to ambiguity, which will greatly reduce the efficiency of obtaining information and seriously affect the user’s experience, especially in the financial field. This paper proposed an entity disambiguation algorithm based on multi-word vector ensemble and decision to eliminate the ambiguity of entities and purify text information in information processing. First of all, we integrate a variety of unsupervised pre-trained word vector models as vector embeddings according to different word vector models’ characteristics. Moreover, we use the classic architecture of long short-term memory (LSTM) combined with the convolutional neural network (CNN) to fine-tune pre-trained Chinese word vectors such as BERT to integrate the output of entity recognition results. Then build the knowledge base and introduce the focal loss function on the basis of CNN and binary classification to improve the effect of entity disambiguation. Experimental results show that the algorithm performs better than the traditional entity disambiguation algorithm based on the single word vector. This method can accurately locate the entity to be disambiguated and has a good performance in disambiguation accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.