Research activities in database management and information retrieval at University of Illinois at Chicago

Isabel Cruz,Bing Liu,Clement Yu,Ashfaq Khokhar,Prasad Sistla,Ouri Wolfson

doi:10.1145/601858.601882

Abstract

Today, millions of people employ powerful search engines such as Google to retrieve information from the Web on a daily basis. In spite of the success, there are problems associated with such powerful search engines. First, the number of pages which are captured by a single search engine is a few billion, while it has been reported that the entire Web has about 500 billion pages and is rapidly growing. Thus, the coverage of the Web by a single search engine is rather small. Second, an index database has to be built to contain the key information of the captured Web pages. This database is huge and takes substantial amount of time to refresh its contents. Thus, it is not surprising that substantial amount of information in the indexed database can be weeks out-of-date. Third, in order to retrieve information from the large database when there are a large number of queries, enormous hardware resources are needed. It has been reported that Google is utilizing many thousands of computers.

Full Text