Abstract
Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.
Highlights
Solving problems in science and engineering was the first motivation for inventing computers
In this paper we first discuss cloud-based scalable data mining and machine learning solutions, we examine the main research issues that must be addressed for implementing massively parallel data mining applications on Exascale computing systems
If we refer to real-world applications, each large-scale data mining and machine learning software that today is under development in the areas of social data analysis and bioinformatics will certainly benefit from the availability of Exascale computing systems and from the use of Exascale programming environments that will offer massive and adaptive-grain parallelism, data locality, local communication and synchronization mechanisms, together with the other features discussed in the previous sections that are needed for reducing execution time and making feasible the solution of new problems and challenges
Summary
Solving problems in science and engineering was the first motivation for inventing computers. Together with different approaches, such as Pig Latin and ECL, those programming models, languages and APIs must be further investigated, designed and adapted for providing data-centric scalable programming models useful to support the reliable and effective implementation of Exascale data analysis applications composed of up to millions of computing units that process small data elements and exchange them with a very limited set of processing elements.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.