Abstract

Typically, search engines are low precision in response to a query, retrieving lots of useless Web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of Web pages search results. In particular, we propose an architecture called WISE, a meta-search engine that automatically builds clusters of related Web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding Web documents. The system which is a Web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the preselection of the retrieved Web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using Web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call