A view of programming scalable data analysis: from clouds to exascale

Domenico Talia

doi:10.1186/s13677-019-0127-x

Abstract

Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.

Highlights

Solving problems in science and engineering was the first motivation for inventing computers
In this paper we first discuss cloud-based scalable data mining and machine learning solutions, we examine the main research issues that must be addressed for implementing massively parallel data mining applications on Exascale computing systems
If we refer to real-world applications, each large-scale data mining and machine learning software that today is under development in the areas of social data analysis and bioinformatics will certainly benefit from the availability of Exascale computing systems and from the use of Exascale programming environments that will offer massive and adaptive-grain parallelism, data locality, local communication and synchronization mechanisms, together with the other features discussed in the previous sections that are needed for reducing execution time and making feasible the solution of new problems and challenges

Summary

Introduction

Solving problems in science and engineering was the first motivation for inventing computers. Together with different approaches, such as Pig Latin and ECL, those programming models, languages and APIs must be further investigated, designed and adapted for providing data-centric scalable programming models useful to support the reliable and effective implementation of Exascale data analysis applications composed of up to millions of computing units that process small data elements and exchange them with a very limited set of processing elements.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cloud Computing	Publication Date: Feb 11, 2019
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

A view of programming scalable data analysis: from clouds to exascale

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing

Lead the way for us

Similar Papers

Cloud Computing for Enabling Big Data Analysis
Loris Belcastro ... Domenico Talia
-
Loris Belcastro, et. al.Loris Belcastro ... Domenico Talia
01 Jan 2020
01 Jan 2020

A novel data ecosystem for coastal analyses
Floris Calkoen ... Etiënne Kras
-
Floris Calkoen, et. al.Floris Calkoen ... Etiënne Kras
15 May 2023
15 May 2023

Predictive Reliability and Fault Management in Exascale Systems
Ramon Canal ... Carles Hernandez
ACM Computing Surveys | VOL. 53
Ramon Canal, et. al.Ramon Canal ... Carles Hernandez
28 Sep 2020
ACM Computing Surveys | VOL. 53

Conquering Big Data Through the Usage of the Wrangler Supercomputer
Jorge Salazar
-
Jorge SalazarJorge Salazar
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A view of programming scalable data analysis: from clouds to exascale

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing