Abstract
Nowadays, there are many ongoing researches to construct knowledge bases from unstructured data. This process requires an ontology that includes enough properties to cover the various attributes of knowledge elements. As a huge encyclopedia, Wikipedia is a typical unstructured corpora of knowledge. DBpedia, a structured knowledge base constructed from Wikipedia, is based on DBpedia ontology which was created to represent knowledge in Wikipedia well. However, DBpedia ontology is a Wikipedia-Infobox-driven ontology. This means that although it is suitable to represent essential knowledge of Wikipedia, it does not cover all of the knowledge in Wikipedia text. In overcoming this problem, resources representing semantics or relations of words such as WordNet and FrameNet are considered useful. In this paper we determined whether DBpedia ontology is enough to cover a sufficient amount of natural language written knowledge in Wikipedia. We mainly focused on the Korean Wikipedia, and calculated the Korean Wikipedia coverage rate with two methods, by the DBpedia ontology and by FrameNet frames. To do this, we extracted sentences with extractable knowledge from Wikipedia text, and also extracted natural language predicates by Part-Of-Speech tagging. We generated Korean lexicons for DBpedia ontology properties and frame indexes, and used these lexicons to measure the Korean Wikipedia coverage ratio of the DBpedia ontology and frames. By our measurements, FrameNet frames cover 73.85% of the Korean Wikipedia sentences, which is a sufficient portion of Wikipedia text. We finally show the limitations of DBpedia and FrameNet briefly, and propose the outlook of constructing knowledge bases based on the experiment results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.