Abstract

AbstractThe exponential growth of textual data in the digital era underlines the pivotal role of Knowledge Graphs (KGs) in effectively storing, managing, and utilizing this vast reservoir of information. Despite the copious amounts of text available on the web, a significant portion remains unstructured, presenting a substantial barrier to the automatic construction and enrichment of KGs. To address this issue, we introduce an enhanced Doc‐KG model, a sophisticated approach designed to transform unstructured documents into structured knowledge by generating local KGs and mapping these to a target KG, such as Wikidata. Our model innovatively leverages syntactic information to extract entities and predicates efficiently, integrating them into triples with improved accuracy. Furthermore, the Doc‐KG model's performance surpasses existing methodologies by utilizing advanced algorithms for both the extraction of triples and their subsequent identification within Wikidata, employing Wikidata's Unified Resource Identifiers for precise mapping. This dual capability not only facilitates the construction of KGs directly from unstructured texts but also enhances the process of identifying triple mentions within Wikidata, marking a significant advancement in the domain. Our comprehensive evaluation, conducted using the renowned WebNLG benchmark dataset, reveals the Doc‐KG model's superior performance in triple extraction tasks, achieving an unprecedented accuracy rate of 86.64%. In the domain of triple identification, the model demonstrated exceptional efficacy by mapping 61.35% of the local KG to Wikidata, thereby contributing 38.65% of novel information for KG enrichment. A qualitative analysis based on a manually annotated dataset further confirms the model's excellence, outshining baseline methods in extracting high‐fidelity triples. This research embodies a novel contribution to the field of knowledge extraction and management, offering a robust framework for the semantic structuring of unstructured data and paving the way for the next generation of KGs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call