High Performance Computing and Big Data

Rishi Divate,Sankalp Sah,Manish Singh

doi:10.1007/978-3-319-53817-4_6

Abstract

High Performance Computing (HPC) has traditionally been characterized by low-latency, high throughput, massive parallelism and massively distributed systems. Big Data or analytics platforms share some of the same characteristics but as of today are limited somewhat in their guarantees on latency and throughput. The application of Big Data platforms has been in solving problems where data that is being operated upon is in motion while HPC has traditionally been applied to performing scientific computations where data is at rest. The programing paradigms that are in use in Big Data platforms for example Map-Reduce (Google Research Publication: MapReduce. Retrieved November 29, 2016, from http://research.google.com/archive/mapreduce.html) and Spark streaming (Spark Streaming/Apache Spark. Retrieved November 29, 2016, from https://spark.apache.org/streaming/) have their genesis in HPC but they need to address some of the distinct characteristics of Big Data platforms. So bringing High Performance to Big Data platforms means addressing the following: 1. Ingesting Data at high volume with low latency 2. Processing streaming data at high volume with low latency 3. Storing Data in a distributed data store 4. Indexing and searching the stored data for Real–Time processing

Full Text