Feature identification for topical relevance assessment in feed search engines1

Yongwook Shin,Jonghun Park

doi:10.3233/ida-130602

Abstract

Feed has become a popular way to effectively distribute and acquire information on the web. The explosive growth of feeds demands a search engine that can help users quickly discover feeds of their interests. Retrieval effectiveness of feed search engine highly depends on a relevance assessment method that determines candidates for ranking query results. However, existing relevance assessment approaches proposed for web page retrieval may produce unsatisfactory result due to the different characteristics of feeds from traditional web pages. Compared to web pages, feed is a dynamic document since it continually generates information on some specific topics. In addition, it is a structured document that consists of several data elements such as title and description. Accordingly, the relevance assessment method for feed retrieval needs to effectively address these unique characteristics of feeds. This paper considers a problem of identifying significant features which are a feature set created from feed data elements, with the aim of improving effectiveness of feed retrieval while at the same time reducing computational cost. We conducted extensive experiments to investigate the problem using support vector machine on real-world data sets, and found the significant features that can be employed for feed search services.

Full Text