Abstract

As an emerging research field, more and more researchers are turning their attention to multimodal entity linking (MEL). However, previous works always focus on obtaining joint representations of mentions and entities and then determining the relationship between mentions and entities by these representations. This means that their models are often very complex and will result in ignoring the relationship between different modal information from different corpus. To solve the above problems, we proposed a paradigm of pre-training and fine-tuning for MEL. We designed three different categories of NSP tasks for pre-training, i.e., mixed-modal, text-only and multimodal and doubled the amount of data for pre-training by swapping the roles of sentences in NSP. Our experimental results show that our model outperforms other baseline models and our pre-training strategies all contribute to the improvement of the results. In addition, our pre-training gives the final model a strong generalization capability that performs well even on smaller amounts of data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.