Abstract

BackgroundWeb-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.MethodWe focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.ResultsWe present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.ConclusionWe have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels.

Highlights

  • The volume of Web-based, free-text documents containing information on science and technology is growing at an increasing rate [1]

  • We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance

  • We demonstrate the extraction of relationships among ontology classes and their instances

Read more

Summary

Introduction

The volume of Web-based, free-text documents containing information on science and technology is growing at an increasing rate [1] Since these documents are not immediately processable by computers in their original format, it takes longer and might lead to pressure from academic institutions, governments and industry to turn this raw data into useful information. Imagine a data set containing information about a group of researchers from a given university, including their names, institutions, publications, patents, and classes they teach This information changes over time, meaning that every year each faculty is adding more of each of these academic products. Web-based, free-text documents on science and technology have been increasing growing on the web Most of these documents are not immediately processable by computers slowing down the acquisition of useful information. The process of ontology creation, instantiation and maintenance is still based on manual methodologies and time and cost intensive

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call