Abstract

Semi-stream processing, the operation of joining a stream of data with non-stream disk-based master data, is a crucial component of near real-time data warehousing. The requirements for semi-stream joins are fast, accurate processing and the ability to function well with limited memory. Currently, semi-stream algorithms presented in the literature such as MeshJoin, Semi-Stream Index Join and CacheJoin can join only one foreign key in the stream data with one table in the master data. However, it is quite likely that stream data have multiple foreign keys that need to join with multiple tables in the master data. We extend CacheJoin to form three new possibilities for multi-way semi-stream joins, namely Sequential, Semi-concurrent, and Concurrent joins. Initially, the new algorithms can join two foreign keys in the stream data with two tables in the master data. However, these algorithms can be easily generalized to join with any number of tables in the master data. We evaluated the performance of all three algorithms, and our results show that the semi-concurrent architecture performs best under the same scenario.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.