Abstract

Named entity disambiguation (NED) finds the specific meaning of an entity mention in a particular context and links it to a target entity. With the emergence of multimedia, the modalities of content on the Internet have become more diverse, which poses difficulties for traditional NED, and the vast amounts of information make it impossible to manually label every kind of ambiguous data to train a practical NED model. In response to this situation, we present MMGraph, which uses multimodal graph convolution to aggregate visual and contextual language information for accurate entity disambiguation for short texts, and a self-supervised simple triplet network (SimTri) that can learn useful representations in multimodal unlabeled data to enhance the effectiveness of NED models. We evaluated these approaches on a new dataset, MMFi, which contains multimodal supervised data and large amounts of unlabeled data. Our experiments confirm the state-of-the-art performance of MMGraph on two widely used benchmarks and MMFi. SimTri further improves the performance of NED methods. The dataset and code are available at https://github.com/LanceZPF/NNED_MMGraph.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call