Abstract

Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.

Highlights

  • Solving problems in science and engineering was the first motivation for inventing computers

  • In this paper we first discuss cloud-based scalable data mining and machine learning solutions, we examine the main research issues that must be addressed for implementing massively parallel data mining applications on Exascale computing systems

  • If we refer to real-world applications, each large-scale data mining and machine learning software that today is under development in the areas of social data analysis and bioinformatics will certainly benefit from the availability of Exascale computing systems and from the use of Exascale programming environments that will offer massive and adaptive-grain parallelism, data locality, local communication and synchronization mechanisms, together with the other features discussed in the previous sections that are needed for reducing execution time and making feasible the solution of new problems and challenges

Read more

Summary

Introduction

Solving problems in science and engineering was the first motivation for inventing computers. Together with different approaches, such as Pig Latin and ECL, those programming models, languages and APIs must be further investigated, designed and adapted for providing data-centric scalable programming models useful to support the reliable and effective implementation of Exascale data analysis applications composed of up to millions of computing units that process small data elements and exchange them with a very limited set of processing elements.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call