Abstract

In data integration systems, a central site often maintain a global catalog of all available data sources, and maintain statistics to allow the query optimizer to generate a good query plan. These statistics could be updated in a lazy manner during query execution time. A user query is often broken into several query fragments, and a centralized task scheduler schedules the execution of the respective query fragment, fetching data from the various data sources. This is then integrated at the central site and presented to the user. As data sources are introduced, there is a need to update the global catalog from time to time. However, due to the autonomous nature of the data sources, which are maintained by local administrators, it is dificult to ensure accurate statistics as well as the availability of the data sources. In addition, since the data are integrated at the central site, the central site could become a potential bottleneck. The unpredictable nature of the wide area environment further exacerbate the problem of query processing.In this paper, we present our ongoing work on dbRouter, a distributed query optimization and processing framework for open environment. The dbRouter provides mechanisms to faciliate the discovery of new data sources, performs distributed query optimization, and manages the routing of data to its destination for processing.KeywordsQuery ProcessingQuery OptimizationProcessing FrameworkOutput StreamData Integration SystemThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call