Abstract

The development of digital technology promotes the construction of the Intangible cultural heritage (ICH) database but the data is still unorganized and not linked well, which makes the public hard to master the overall knowledge of the ICH. An ICH knowledge graph (KG) can help the public to understand the ICH and facilitate the protection of the ICH. However, a general framework of ICH KG construction is lacking now. In this study, we take the Chinese ICH (nation-level) as an example and propose a framework to build a Chinese ICH KG combining multiple data sources from Baike and the official website, which can extend the scale of the KG. Besides, the data of ICH grows daily, requiring us to design an efficient model to extract the knowledge from the data to update the KG in time. The built KG is based on the triple 〈entity, attribute, attribute value〉 and we introduce the attribute value extraction (AVE) task. However, the public Chinese ICH annotated AVE corpus is lacking. To solve that, we construct a Chinese ICH AVE corpus based on the Distant Supervision (DS) automatically rather than employing traditional manual annotation. Currently, AVE is usually seen as the sequence tagging task. In this paper, we take the ICH AVE as a node classification task and propose an AVE model BGC, combining the BiLSTM and graph attention network, which can fuse and utilize the word-level and character-level information by means of the ICH lexicon generated from the KG. We conduct extensive experiments and compare the proposed model with other state-of-the-art models. Experimental results show that the proposed model is of superiority.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call