The growing need of utilizing unstructured knowledge embedded in open-domain natural language text into machine-processable forms requires the induction of hardly extracted structured knowledge into knowledge bases which makes the Semantic Web vision a reality. In this context, ontologies, and ontological knowledge (triples) plays a vital role. This research introduces two novel concepts named Directed Collocation (DC) and Joined Directed Collocation (JDC) along with a methodical application of them to infer new ontological knowledge. Introduced Quality-Threshold-Value (QTV) parameter improves the quality of the inferred ontological knowledge. Having set a moderate value (3) for QTV, this approach inferred 95,491 new ontological knowledge from 43,100 triples of open domain Sri Lankan English news corpus. Indeed, the outcome was approximately doubled in size as the source corpus. Some inferred ontological knowledge was identical with the original corpus content, which evidences the accuracy of this approach. The remaining were validated using inter-rater agreement method (high reliability) and out of which around 56% were estimated as effective. The inferred outcome which is in the triple format may use in any knowledge base. The proposed approach is domain independent. Thus, helps to construct/extend ontologies for any domain with the help of less or no human specialists.
Read full abstract