Abstract

High Performance Computing (HPC) has traditionally been characterized by low-latency, high throughput, massive parallelism and massively distributed systems. Big Data or analytics platforms share some of the same characteristics but as of today are limited somewhat in their guarantees on latency and throughput. The application of Big Data platforms has been in solving problems where data that is being operated upon is in motion while HPC has traditionally been applied to performing scientific computations where data is at rest. The programing paradigms that are in use in Big Data platforms for example Map-Reduce (Google Research Publication: MapReduce. Retrieved November 29, 2016, from http://research.google.com/archive/mapreduce.html) and Spark streaming (Spark Streaming/Apache Spark. Retrieved November 29, 2016, from https://spark.apache.org/streaming/) have their genesis in HPC but they need to address some of the distinct characteristics of Big Data platforms. So bringing High Performance to Big Data platforms means addressing the following: 1. Ingesting Data at high volume with low latency 2. Processing streaming data at high volume with low latency 3. Storing Data in a distributed data store 4. Indexing and searching the stored data for Real–Time processing

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call