Abstract
Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on a blocking execution model, are no longer attractive because of their long initial response time. Moreover, the execution engine may be stalled by slow data delivery rates or unavailable data sources. In this paper, we adopt a non-blocking execution model for evaluating queries. We propose a symmetric partition-based join algorithm, called AJoin, that can operate with small memory requirement, produce first few answer tuples quickly, and blocks only when all available data have been examined. We also examine heuristics to manage the partitions and address the memory management issues of AJoin. To evaluate multi-join query plans, we also proposed two new strategies, m-AJoin and Pm-AJoin. Both strategies evaluate each join operation using AJoin. While m-AJoin accesses data from remote sources in its entirety, Pm-AJoin accesses remote data in chunks of smaller partitions. Our performance study shows the effectiveness of the proposed approaches for join and multi-join processing in a multi-user data integration system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.