Continuous data streams are information sources in which data arrives in high volume in unpredictable rapid bursts. Processing data streams is a challenging task due to (1) the problem of random access to fast and large data streams using present storage technologies and (2) the exact answers from data streams often being too expensive. A framework of building a Grid-based Zero-Latency Data Stream Warehouse (GZLDSWH) to overcome the resource limitation issues in data stream processing without using approximation approaches is specified. The GZLDSWH is built upon a set of Open Grid Service Infrastructure (OGSI)-based services and Globus Toolkit 3 (GT3) with the capability of capturing and storing continuous data streams, performing analytical processing, and reacting autonomously in near real time to some kinds of events based on a well-established knowledge base. The requirements of a GZLDSWH, its Grid-based conceptual architecture, and the operations of its service are described in this paper. Furthermore, several challenges and issues in building a GZLDSWH, such as the Dynamic Collaboration Model between the Grid services, the Analytical Model, and the Design and Evaluation aspects of the Knowledge Base Rules are discussed and investigated.
Read full abstract