A Theme-based Search Technique

Nida Al-Chalabi,Khalil Shihab

doi:10.1109/cec-eee.2007.15

Abstract

The current search engines usually return a large number of irrelevant documents for a certain query. As a result, accessing such information and filtering out these documents can cause frustration and often result in waste of time and effort for the users while surfing the web. This is mainly because of the underlying techniques used in these engines. These techniques are mostly based in the frequency of the keywords of the query in the HTML code. In addition, issues such as dealing with classifying the pages found for a query according to previous visits along with features needed to make intelligent decisions regarding the access patterns of the users are not considered. This work presents an intelligent search engine, called ORCA that returns the most relevant documents for user's queries. This search engine analyses the queries and builds themes (models) to be used when the engine is confronted with similar queries. The intelligent component is used for constructing a model of the user behavior and using that model to fetch and even prefetch information and documents considered of interest to the user. It uses both latent semantic analysis and web page feature selection for clustering web pages. Latent semantic analysis is used to find the semantic relations between keywords, and between documents.

Full Text