Abstract
Nowadays, Internet-of-Things (IoT) devices generate data at high speed and large volume. Often the data require real-time processing to support high system responsiveness which can be supported by localised Cloud and/or Fog computing paradigms. However, there are considerably large deployments of IoT such as sensor networks in remote areas where Internet connectivity is sparse, challenging the localised Cloud and/or Fog computing paradigms. With the advent of the Raspberry Pi, a credit card-sized single board computer, there is a great opportunity to construct low-cost, low-power portable cloud to support real-time data processing next to IoT deployments. In this paper, we extend our previous work on constructing Raspberry Pi Cloud to study its feasibility for real-time big data analytics under realistic application-level workload in both native and virtualised environments. We have extensively tested the performance of a single node Raspberry Pi 2 Model B with httperf and a cluster of 12 nodes with Apache Spark and HDFS (Hadoop Distributed File System). Our results have demonstrated that our portable cloud is useful for supporting real-time big data analytics. On the other hand, our results have also unveiled that overhead for CPU-bound workload in virtualised environment is surprisingly high, at 67.2%. We have found that, for big data applications, the virtualisation overhead is fractional for small jobs but becomes more significant for large jobs, up to 28.6%.
Highlights
Low-cost, low-power embedded devices are ubiquitous, part of the Internet-of-Things (IoT).These devices or things include RFID tags, sensors, actuators, smartphones, etc., which have substantial impact on our everyday-life and behaviour [1]
We extend our previous work on constructing Raspberry Pi Cloud to study its feasibility for real-time big data analytics under realistic application-level workload in both native and virtualised environments
There are: (1) Standalone mode: where Spark interacts with HDFS directly but MapReduce could collaborate with it in the same level to run jobs in cluster; (2) Hadoop Yarn: Spark just runs over Yarn which is a Hadoop distributed container manager; (3) Spark in MapReduce (SIMR): in this case Spark can run Spark jobs in addition to the standalone deployment
Summary
Low-cost, low-power embedded devices are ubiquitous, part of the Internet-of-Things (IoT). This calls for a radically new computing paradigm which: (1) is capable of processing data efficiently; (2) has the agility of Cloud Computing; (3) is portable to support on-demand physical mobility; and (4) is low-cost, low-power for sustainable computing in remote areas. This new computing paradigm has been made possible by the emergence of low-cost, low-power credit card-sized single board computer—the Raspberry Pi [5]. We designed and conducted a set of experiments to test the performance of a single node and a cluster of 12 Raspberry Pi 2 boards with realistic network and CPU bound workload in both native and virtualised environments.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have