Abstract

Aim/Purpose: When Data Science students use a Cloud environment such as AWS or Azure, they are not able to have direct hands-on experience with the under-lying hardware components. When students create virtual machines in the Cloud, they specify memory size, CPUs, disk space, etc. However, they cannot reach out and touch the underlying hardware directly since it resides in the Cloud. Background: The ability to purchase commodity servers (e.g., $3,000 per Dell server) to create a cluster of multiple machines is cost prohibitive for most faculty and students because it can cost upwards of $30,000 for 10 machines. This cost does not include the other hardware components that are required for the cluster, such as cooling equipment, cables, rack, etc. Methodology: The research methodology leveraged for this research was to build a prototype to evaluate the costs of using inexpensive hardware and software ($1628.82) in comparison to a more expensive cluster of commodity servers ($30,000). Contribution: There is very little research literature about using this approach of using Raspberry Pi servers as an inexpensive replacement for commodity servers. Findings: This paper demonstrates that Raspberry Pi 4b servers (with 8 gig of RAM) can be leveraged to build a cluster of low cost servers to run both Linux Ubuntu 20 and MongoDB Sharding (distributed processing). Recommendations for Practitioners: Practitioners will appreciate this paper because it is a tutorial that describes assembling the cluster components and then installing MongoDB Shar-ding (distributed processing) on a cluster of 9 Rpi 4b servers. Recommendations for Researchers: Researchers will appreciate this paper because it provides a new inexpensive alternative to using a Cloud environment or an expensive cluster of commodity servers to research distributed processing. Impact on Society: Students and faculty now have an inexpensive option of creating a personalized cluster of servers to experiment with distributed processing. Future Research: Future Research can include testing this cluster with other distributed processing tools, such as the Hadoop ecosystem or NoSQL Databases (e.g. such as Cassandra)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call