Abstract
Knowledge graphs are usually constructed to describe the various concepts that exist in real world as well as the relationships between them. There are many knowledge graphs in specific fields, but they usually pay more attention on text or structured data, ignoring the image vision information, and cannot play an adequate role in the emerging visualization applications. Aiming at this issue, we design a method that integrates image vision information and text information derived from Wikimedia Commons to construct a domain-specific multi-modal knowledge graph, taking the metallic materials domain as an example to illustrate the method. The text description of each image is regarded as its context semantic to acquire the image's context semantic labels based on the DBpedia resource. Furthermore, we adopt deep neural network model instead of simple visual descriptors to acquire the image's visual semantic labels using the concepts from WordNet. In order to fuse the visual semantic labels and context semantic labels, a path-based concept extension and fusion strategy is proposed based on the conceptual hierarchies of WordNet and DBpedia to obtain the effective extension concepts as well as the links between them, increasing the scale of the knowledge graph and enhancing the correlation between images. The experimental results show that the maximum extension level has a significant impact on the quality of the generated domain knowledge graph, and the best extension level number is respectively determined for both DBpedia and WordNet. In addition, the results of this paper are compared with IMGpedia to further show the effectiveness of the proposed method.
Highlights
The research aiming at the semantic representation of domain-specific data generally focuses on text data or structured data, such as transforming structured databases into knowledge graph to provide semantic query services [1] and extracting knowledge from unstructured source data to build new ontologies
The VtGexit set of image Gi is generated from the image association text in the Wikimedia Commons by using DBpedia-Spotlight, and we can get a separate set of entity labels which are resources in DBpedia
Using ImageNet data set to train VGG-Net model to get VvGisiion of each image is acquired from the image in the Wikimedia Commons, and we can get a separate set of visual semantic labels which are resources in WordNet
Summary
The research aiming at the semantic representation of domain-specific data generally focuses on text data or structured data, such as transforming structured databases into knowledge graph to provide semantic query services [1] and extracting knowledge from unstructured source data to build new ontologies X. Zhang et al.: From Vision to Content: Construction of Domain-Specific Multi-Modal Knowledge Graph image-title pairs. Based on the image information in Wikimedia Commons, DBpedia and WordNet are used to provide the domain background knowledge to connect the visual semantic labels and context semantic labels. A domain-specific multi-modal knowledge fusion method is proposed, which combines the visual semantic with the context semantic of the images and fuses them based on the conceptual hierarchies of WordNet and DBpedia, so as to construct a domain-specific multi-modal knowledge graph. We use the ImageNet dataset to train VGG-Net [14] model to obtain the visual semantic labels of each image, which can represent the visual semantic information of the image using the concepts derived from WordNet resources.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.