A Data as a Product Model for Future Consumption of Big Stream Data in Clouds

Guangyan Huang,Yanchun Zhang,Jing He,Wanlei Zhou,Chi‐Hung Chi

doi:10.1109/scc.2015.43

Abstract

Data is becoming the world's new natural resource and big data use grows quickly. The trend of computing technology is that everything is merged into the Internet and 'big data' are integrated to comprise complete information for collective intelligence. With the increasing size of big data, refining big data themselves to reduce data size while keeping critical data (or useful information) is a new approach direction. In this paper, we provide a novel data consumption model, which separates the consumption of data from the raw data, and thus enable cloud computing for big data applications. We define a new Data-as-a-Product (DaaP) concept, a data product is a small sized summary of the original data and can directly answer users' queries. Thus, we separate the mining of big data into two classes of processing modules: the refine modules to change raw big data into small sized data products, and application-oriented mining modules to discover desired knowledge further for applications from well-defined data products. Our practices of mining big stream data, including medical sensor stream data, streams of text data and trajectory data, demonstrated the efficiency and precision of our DaaP model for answering users' queries.

Full Text