Abstract
In the last few decades, Grid technologies have emerged as an important area in parallel and distributed computing. The Grid can be seen as a computational and large-scale support, and even in some cases as a high-performance support. In recent years, the data mining community have been increasingly using Grid facilities to store, share, manage and mine large-scale data-driven applications. Indeed, data mining and knowledge discovery applications are by nature distributed, and are using the Grid as their execution environment. This particularly led to a great interest of the community in distributed data mining and knowledge discovery on large Grid platforms. Many Grid-based Data Mining (DM) and Knowledge Discovery (KD) frameworks were initiated, and proposed different techniques and solutions for large-scale datasets mining. These include the ADMIRE project initiated by the PCRG (Parallel Computational Research Group) at the University College Dublin, the Knowledge Grid project at the University of Calabria, The GridMiner project at the University of Vienna, among others. These knowledge discovery1 frameworks on the Grid aim to offer high-level abstractions and techniques for distributed management, mining, and knowledge extraction from data repositories and warehouses. Most of them use existing Grid technologies and systems to build specific knowledge discovery services, data management, analysis, and mining techniques. Basically, this consists of either porting existing algorithms and applications on the Grid, or developing new mining and knowledge extraction techniques, by exploiting the Grid features and services. Grid infrastructures usually provide basic services of communication, authentication, storage and computing resources, data placement and management, etc. For example, the Knowledge Grid system uses services provided by the Globus Toolkit, and the ADMIRE framework uses a Grid system called DGET, developed by our team at the University College Dublin. We will give some details about the best-known DM/KD frameworks in section 2. Note that this chapter is not intended to Grid systems or the way they are interfaced with knowledge discovery frameworks. Indeed, beyond the architecture design of Grid systems, the resources and data management policies, the data integration or placement techniques, and so on, these DM and KD frameworks need
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have