Abstract

The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset. One focus of the current research on OLAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases.Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering methods are often used by OLAP system designer.In this article, we propose using hierarchical agglomerative clustering with a metric that comes from ecological studies (the Gower similarity index) to build automatically hierarchical dimensions in an OLAP cube. With this similarity index we can perform a hierarchical clustering on heterogeneous datasets that contains qualitative and quantitative variables.We offer a prototypical automatic system which builds dimension for an OLAP cube and we measure the performances of this system according to the number of clustered individuals and according to the number of variables used for clustering. Thanks to these measures we can offer an approximation of performances with a large dataset.Thereby the Gower index in a hierarchical agglomerative clustering permits the management of heterogeneous dataset with missing values in a context of automatic building of OLAP cube. With this methodology, we can build new dimensions based on hierarchies in the data, which are not evident. The data mining methods can complete the expert knowledge during the design of an OLAP cube, because these methods can explain the inherent structure of the data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call