Over the last years, big data has emerged as a new paradigm for the processing and analysis of massive volumes of data. Big data processing has been combined with service and cloud computing, leading to a new class of services called “Big Services”. In this new model, services can be seen as an abstract layer that hides the complexity of the processed big data. To meet users' complex and heterogeneous needs in the era of big data, service reuse is a natural and efficient means that helps orchestrating available services' operations, to provide customer on-demand big services. However different from traditional Web service composition, composing big services refers to the reuse of, not only existing high-quality services, but also high-quality data sources, while taking into account their security constraints (e.g., data provenance, threat level and data leakage). Moreover, composing heterogeneous and large-scale data-centric services faces several challenges, apart from security risks, such as the big services' high execution time and the incompatibility between providers' policies across multiple domains and clouds. Aiming to solve the above issues, we propose a scalable approach for big service composition, which considers not only the quality of reused services (QoS), but also the quality of their consumed data sources (QoD). Since the correct representation of big services requirements is the first step towards an effective composition, we first propose a quality model for big services and we quantify the data breaches using L-Severity metrics. Then to facilitate processing and mining big services' related information during composition, we exploit the strong mathematical foundation of fuzzy Relational Concept Analysis (fuzzy RCA) to build the big services' repository as a lattice family. We also used fuzzy RCA to cluster services and data sources based on various criteria, including their quality levels, their domains, and the relationships between them. Finally, we define algorithms that parse the lattice family to select and compose high-quality and secure big services in a parallel fashion. The proposed method, which is implemented on top of Spark big data framework, is compared with two existing approaches, and experimental studies proved the effectiveness of our big service composition approach in terms of QoD-aware composition, scalability, and security breaches.
Read full abstract