Stream Analytics Research Articles

The objective of this paper is to present the Moriarty framework and show one use case of the recommendation of entertainment events. Moriarty is a tool that can generate Big Data near real-time analytics solutions (Streaming Analytics). This new tool makes possible the collaboration among the data scientist and the software engineer. Through Moriarty, they join forces for the rapid generation of new software solutions. The data scientist works with algorithms and data transformations using a visual interface, while the software engineer works with the idea of services to be invoked. The underlying idea is that a user can build projects of Artificial Intelligence and Data Analytics without having to make any line of code. The main power of the tool is to reduce the ‘time to market’ in an application which embeds complex algorithms of Artificial Intelligence. It is based on different Artificial Intelligence algorithms (like Deep Learning, Natural Language Processing and Semantic Web) and Big Datamodules (Spark as a distributed data engine and access to NoSQL databases). Moriarty is divided into several layers; its core is a BPMN engine, which executes the processing and defines data analytics process, called workflows. Each workflow is defined by the standard BPMN model and is linked to a set of reusable functions or Artificial Intelligence algorithms written following a service-oriented architecture. An example of service presented is a recommendation application of restaurants, concerts, entertainment and events in general, where information is collected from social networks and websites, is processed by Natural Language Processingalgorithms and finally introduced into a graph database.

Read full abstract

Since distributed stream analytics is treated as a kind of cloud service, there exists a pressing need for its reliability and fault-tolerance, to guarantee the streaming data tuples to be processed in the order of their generation in every dataflow path, with each tuple processed once and only once. Currently there exist two kind approaches: one treats the whole process as a single transaction, and therefore suffers from the loss of intermediate results during failures; the other relies on the receipt of acknowledgement (ACK) to decide whether moving forward to emit the next resulting tuple or resending the current one after timeout, on the per-tuple basis, thus incurs extremely high latency penalty. In contradistinction to the above, we propose the backtrack mechanism for failure recovery, which allows a task to process tuples continuously without waiting for ACKs and without resending tuples in the failure-free case, but to request (ASK) the source tasks to resend the missing tuples only when it is restored from a failure which is a rare case thus has limited impact on the overall performance. The specific hard problem for building a transaction layer on-top of an existing stream processing platform consists in how to keep track the physical input/output messaging channels in order to realize re-messaging during failure recovery. Our solution is characterized by tracking physical messaging channels logically, for that we introduce the notions of virtual channel, task alias and messageId-set in reasoning, recording and communicating the channel information. We also provide a designated messaging channel, separated from the regular dataflow channel, for signaling ACK/ASK messages and for resending tuples, in order to avoid interrupting the regular order of data transfer. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we developed on top of Storm. As a principle, we ensure all the transactional properties to be system supported and transparent to users. Our experience shows the novelty and efficiency of the proposed mechanisms.

Read full abstract

Stream Analytics Research Articles

Related Topics

Articles published on Stream Analytics

Query-able Kafka

ViewDF: Declarative incremental view maintenance for streaming data

Analysis of a Sequence of Events in Healthcare

Proposing a streaming Big Data analytics (SBDA) platform for condition based maintenance (CBM) and monitoring transportation systems

Apache REEF

CitizenHelper: A Streaming Analytics System to Mine Citizen and Web Data for Humanitarian Organizations

Defining the execution semantics of stream processing engines

Machine Learning and IoT Streaming Analytics

Moriarty: Improving ‘Time To Market’ in big data and Artificial intelligence applications

Towards Real-time Streaming Analytics based on Cloud Computing

REEF: Retainable Evaluator Execution Framework.

CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

Cloud Based Fuzzy Healthcare System

Fault Tolerant Distributed Stream Processing based on Backtracking

Multi-dimensional Query Authentication for On-line Stream Analytics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Stream Analytics Research Articles

Related Topics

Articles published on Stream Analytics

Query-able Kafka

ViewDF: Declarative incremental view maintenance for streaming data

Analysis of a Sequence of Events in Healthcare

Proposing a streaming Big Data analytics (SBDA) platform for condition based maintenance (CBM) and monitoring transportation systems

Apache REEF

CitizenHelper: A Streaming Analytics System to Mine Citizen and Web Data for Humanitarian Organizations

Defining the execution semantics of stream processing engines

Machine Learning and IoT Streaming Analytics

Moriarty: Improving ‘Time To Market’ in big data and Artificial intelligence applications

Towards Real-time Streaming Analytics based on Cloud Computing

REEF: Retainable Evaluator Execution Framework.

CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

Cloud Based Fuzzy Healthcare System

Fault Tolerant Distributed Stream Processing based on Backtracking

Multi-dimensional Query Authentication for On-line Stream Analytics