Abstract

The amount of digital data is increasing beyond any previous estimation and data stores and sources are more and more pervasive and distributed. Professionals and scientists need advanced data analysis tools and services coupled with scalable architectures to support the extraction of useful information from big data repositories. Cloud computing systems offer an effective support for addressing both the computational and data storage needs of big data mining and parallel knowledge discovery applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high performance processors to get results in acceptable times. In this paper we introduce the topic and the main research issues. We discuss how to make knowledge discovery services scalable and present the Data Mining Cloud Framework designed for developing and executing distributed data analytics applications as workflows of services. In this environment we use data sets, analysis tools, data mining algorithms and knowledge models that are implemented as single services that can be combined through a visual programming interface in distributed workflows to be executed on Clouds. The main features of the programming interface are described and performance evaluation of knowledge discovery applications are reported.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call