Abstract


 
 
 Everyday a huge number of new information resources are linked to the web. This way the web is growing very fast, making search tasks more and more difficult with worse results. To solve the problem several initiatives were undertaken and a new area of research and development emerged: the one called Semantic Web.When we refer to the semantic web we are thinking about a network of concepts. Each concept has a group of related resources and can be related to other concepts; we can then use this concept network to navigate among web resources or simply among information resources. From the undertaken initiatives one became an ISO standard: Topic Maps ISO 13250.
 The aim of this paper is to introduce a Topic Map (TM) Builder, that is a processor that extracts topics and relations from instances of a family of XML documents.A TM-Builder is strongly dependent on the resources structure. So, to extract a topic map for different collections of information resources (sets of documents with different structures) we have to implement several TM-Builders, one for each collection. This is not very easy! To overcome this inconvenient we have created an XML abstraction layer for TM-Builders that enables us to specify the topic map we want to build from a concrete family of resources, in order to generate automatically the intended extractor.
 To describe that process, i.e. the extraction of knowledge from XML documents to produce a TM, we present a language to specify topic maps for a class of XML documents, that we call XSTM (XML Specification for Topic Maps). We also discuss a XSL processor that automatically generates the Extractor from its formal specification written in XSTM, the XSTM-P.
 
 

Highlights

  • This paper is concerned with knowledge extraction from documents marked up in XML

  • This navigator gives access to the information contained in the source documents filtered by the Topic Map specification, allowing the navigation through the topic instances driven by the associations, defined in the ontology specified in XSTM

  • In this paper we introduced an architecture for the automatic construction of topic maps using XSL stylesheets to process a family of XML documents – the information resources

Read more

Summary

Introduction

This paper is concerned with knowledge extraction from documents marked up in XML. To go straight to our target topic, we clearly assume that the reader is familiar with XML and companion for document’s structuring and processing. We intend to show that such a tool can be generated automatically instead of creating by hand a new one each time the documents type changes In this context, we understood that a Topic Maps specification language was necessary to enable the systematic derivation of a TM-Builder. A formal specification of the proposed language, XSTM, is provided, showing a diagram that depicts the XML-Schema [10], and listing the respective DTD (Document Type Definition); in that section, we detail the elements introduced in the DTD and illustrate their use through examples. For those more familiar with grammar based language definitions, we include a CFG (Context Free Grammar) for XSTM as well. A synthesis of the paper and hints on future work are presented in the last part, section 6, together with some metrics about XSTM use

Ontologies
Topic Maps
The characteristics of Topic Map model
How to define a Topic Map
The TM-Builder – The Topic Map Extractor
XSTM: an XML Language to specify Topic Map extractors
XSTM-p: an XSTM Processor
Case-Study – Conference website specification and generation
The XSTM specification for the XATA
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call