Abstract

There is a huge increase in the amount of generated data since the explosion of the Internet. This generated data which is usually collected in different formats and from multiple sources is popularly termed Big Data. Big data contains uncertainty. To handle uncertainty in big data, probabilistic reasoning is used to develop probabilistic models that specify generic knowledge in different topics. These models are used in conjunction with an inference algorithm to enable decision makers especially during uncertain situations. Extensive knowledge in fields such as statistics, machine learning and probability theories are employed in the development of these probabilistic models. Thus, it is usually a difficult undertaking. Probabilistic programming was introduced to simplify and enable development of complex models. Again, decision makers often need to use knowledge from historic data as well as current data to make cogent decisions. Thus, the necessity to unify processing of historic and real-time data with low latency. The Lambda architecture was introduced for this purpose. This paper presents a framework called Kognitor that simplifies the design and development of difficult models using probabilistic programming and Lambda architecture. Evaluation of this framework is also presented in this paper using a case study to highlight the crucial potential of probabilistic programming to achieve simplification of model development and enable real-time reasoning on big data. Thus, demonstrating the effectiveness of the framework. Finally, results of this evaluation are presented in this paper. The Kognitor framework can be used to steer effective and easier implementation of complicated real-life situations as probabilistic models. This will be beneficial in the big data processing domain and for decision makers. Kognitor ensures cost-effectiveness using contemporary big data tools and technology on commodity hardware. Kognitor framework will also be beneficial in academia with respect to the use of probabilistic programming.

Highlights

  • IntroductionAnalysis and/or calculation generate facts, that are usually not organized

  • Planning, analysis and/or calculation generate facts, that are usually not organized

  • This paper presents a framework that demonstrates the effectiveness of probabilistic programming in the development of big data processing systems that uses complex probabilistic models

Read more

Summary

Introduction

Analysis and/or calculation generate facts, that are usually not organized. Huge amount of data is produced and amassed, sometimes as a secondary product from the activities and processes of entities and individuals [1]. This huge data is termed big data. Novel tools and techniques are required for the effective management and analysis of big data. Technologies developed to analyze big data were mainly geared towards batch processing [29]. The majority of these batch processing tools used the MapReduce framework designed by Google [30]. A popular example of the batch processing big data tool implemented using MapReduce is Hadoop.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call