Data integration systems on the Deep Web offer a transparent means to query multiple data sources at once. Result merging– the generation of an overall ranked list of results from different sources in response to a query– is a key component of a data integration system. In this work we present a result merging model, called Active Relevance Weight Estimation model. Different from the existing techniques for result merging, we estimate the relevance of a data source in answering a query at query time. The relevances for a set of data sources are expressed with a (normalized) weighting scheme: the larger the weight for a data source the more relevant the source is in answering a query. We estimate the weights of a data source in each subset of the data sources involved in a training query. Because an online query may not exactly match any training query, we devise methods to obtain a subset of training queries that are related to the online query. We estimate the relevance weights of the online query from the weights of this subset of training queries. Our experiments show that our method outperforms the leading merging algorithms with comparable response time.
Read full abstract