Crawling the Web with OntoDir

Antonio Picariello,Antonio M Rinaldi

doi:10.1007/978-3-540-74469-6_71

Abstract

Managing large amount of information on the internet needs more efficient and effective methods and techniques for mining and representing information. The use of ontologies for knowledge representation has had a fast increase in the last years: in fact the use of a common and formal representation of knowledge allows a more accurate analysis of a number of documents content, in several contexts. One of these challenging applications is the Web: the World Wide Web, in fact, has nowadays those kinds of requirements which are hard to satisfy, especially when one considers a complex scenario as the Semantic Web. In this paper we present a methodology for automatic topic annotation of Web pages. We describe an algorithm for words disambiguation using an apposite metric for measuring the semantic relatedness and we show a technique which allows to detect the topic of the analyzed document by means of ontologies extracted from a knowledge base. The strategy is implemented in a system where these information are taken into account to build a topic hierarchy automatically created and not a priori defined. Experimental results are presented and discussed in order to measure the effectiveness of our approach.KeywordsSemantic RelatednessWord Sense DisambiguationTopic DetectionTopic HierarchyGeneral Knowledge BaseThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text