Ontology Expansion Research Articles

Extracting data from various semi-structured sources is a topic that has received a lot of attention. Wrapper induction specifically has been studied extensively, where users annotate a couple of data sources with examples of the data they want, after which a procedure (wrapper) is constructed that can optimally extract similar data as well. In this paper a novel wrapper induction approach is proposed, exploiting the premise of the general applicability of the XPath query language, studied specifically within the context of web pages. After a user annotates a limited set of web pages with the required data, a generalised XPath is constructed that is capable of extracting the examples and, optimally, similar data as well. This generalised baseline XPath is then enriched with predicates, based on context and structure of the data sources, to optimise the precision/recall balance of the data extraction capability of the wrapper. Six variations of such limiting predicates are introduced and investigated. In this paper, it is shown that the baseline approach often generalises the samples too much, leading to a decreased precision. Enriching the baseline wrapper by the addition of predicates limits the generalisation power of the queries in an intelligent manner. Experimental results show that there is a significant improvement in the overall precision of the generalised query, without an excessive loss in recall. Documented tests and real world experience with a large amount of data show that the technique is flexible, easily understood and applicable in a broad range of applications. It is not only of interest in the fields of web information retrieval, but can also be used in the contexts of, e.g., reverse engineering of databases, ontology expansion and deep web data mining, as both simple lists of data and complex structures can be extracted.

Read full abstract

BackgroundIt is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers proposed a Quick Term Templates (QTTs) process targeted at generating new ontology classes following the same pattern, using term templates in a spreadsheet format.ResultsInspired by the ODPs and QTTs, the Ontorat web application is developed to automatically generate new ontology terms, annotations of terms, and logical axioms based on a specific ODP(s). The inputs of an Ontorat execution include axiom expression settings, an input data file, ID generation settings, and a target ontology (optional). The axiom expression settings can be saved as a predesigned Ontorat setting format text file for reuse. The input data file is generated based on a template file created by a specific ODP (text or Excel format). Ontorat is an efficient tool for ontology expansion. Different use cases are described. For example, Ontorat was applied to automatically generate over 1,000 Japan RIKEN cell line cell terms with both logical axioms and rich annotation axioms in the Cell Line Ontology (CLO). Approximately 800 licensed animal vaccines were represented and annotated in the Vaccine Ontology (VO) by Ontorat. The OBI team used Ontorat to add assay and device terms required by ENCODE project. Ontorat was also used to add missing annotations to all existing Biobank specific terms in the Biobank Ontology. A collection of ODPs and templates with examples are provided on the Ontorat website and can be reused to facilitate ontology development.ConclusionsWith ever increasing ontology development and applications, Ontorat provides a timely platform for generating and annotating a large number of ontology terms by following design patterns.Availability: http://ontorat.hegroup.org/

Read full abstract

Ontology Expansion Research Articles

Related Topics

Articles published on Ontology Expansion

Ontology expansion based on UWN reusability

Ontology expansion based on UWN reusability

An ontology constructing technology oriented on massive social security policy documents

An Ontology-Based Model for Treatment Guidelines of Internet and Games Addiction

Predicate enrichment of aligned XPaths for wrapper induction

Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns.

Ontology expansion: appending with extracted sub-ontology

Retrieval strategy based on semantic expansion of keywords

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Ontology Expansion Research Articles

Related Topics

Articles published on Ontology Expansion

Ontology expansion based on UWN reusability

Ontology expansion based on UWN reusability

An ontology constructing technology oriented on massive social security policy documents

An Ontology-Based Model for Treatment Guidelines of Internet and Games Addiction

Predicate enrichment of aligned XPaths for wrapper induction

Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns.

Ontology expansion: appending with extracted sub-ontology

Retrieval strategy based on semantic expansion of keywords