Abstract

Due to the emergence of digital revolution and competitiveness in recent decades, almost all organizations and industries intend to develop solutions to extract information from unstructured documents. These documents comprise of information related to multiple divergent domains, and therefore, there is a need of a multidomain knowledge base. Since recent research works suggest ontology as the predominant model, it is proposed to evolve a unified ontology modeling approach with multiple layers and divergent domains to support information processing from unstructured documents. The model is evolved by integrating relevant domains to facilitate cross domain query. Further as the features of unstructured documents span across multiple domains, domain identification is to be performed prior to any information processing. Hence, an attempt is made to identify the domain using the proposed ontology model. The proposed ontology is developed for the Thermal Power Plant Industry and domain identification is demonstrated with an example. A statistical similarity index is proposed to associate divergent volatile features of unstructured text with ontology knowledge for domain identification. The outcome of the proposal is evaluated using the proposed similarity index. A subsequent study to extract information using classified content with the support of directed acyclic graph relationship is under progress. The merit of the proposal is its ability to extend its usage across multiple stages of information processing with distinctive purpose.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call