Abstract

Raditional approaches for managing enterprise data revolve around a batch driven Extract Transform Load process, a one size fits all approach for storage, and an application architecture that is tightly coupled to the underlying data infrastructure. The emergence of Big Data technologies have led to the creation of alternate instantiations of the traditional approach, one where the storage systems have moved from relational databases to NoSQL technologies like HDFS. This approach to data management has been found wanting as enterprises begin to deal with complex and heterogeneous data, especially in the area of Internet of Things (IoT). IoT environments are characterized by data producers and data processing requirements. In this paper, we articulate the shortcomings of traditional approaches to data management in the context of IoT. We identify the challenges brought forth due to content heterogeneity, requirements of scale, and robustness of ETL processes, and the need to rapidly onboard and support multiple applications such as analytics. Our approach introduces the Linked Enterprise Data Model (LEDM), a knowledge representation approach derived from Linked Data for modeling and linking the disparate aspects of data infrastructure. We leverage this model in developing a scalable and robust ETL framework. The framework adopts the Lambda architecture approach and supports both stream and batch processing of incoming data. We build this capability for the streaming leg of the Lambda architecture comprising of Amazon Kinesis, Apache Spark Streaming, and Amazon Dynamo.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call