Empirical studies to assess the understandability of data warehouse schemas using structural metrics

Manuel Angel Serrano,Houari A Sahraoui,Coral Calero,Mario Piattini

doi:10.1007/s11219-007-9030-7

Manuel Angel Serrano, Houari A Sahraoui + Show 2 more

https://doi.org/10.1007/s11219-007-9030-7

Copy DOI

Abstract

Data warehouses are powerful tools for making better and faster decisions in organizations where information is an asset of primary importance. Due to the complexity of data warehouses, metrics and procedures are required to continuously assure their quality. This article describes an empirical study and a replication aimed at investigating the use of structural metrics as indicators of the understandability, and by extension, the cognitive complexity of data warehouse schemas. More specifically, a four-step analysis is conducted: (1) check if individually and collectively, the considered metrics can be correlated with schema understandability using classical statistical techniques, (2) evaluate whether understandability can be predicted by case similarity using the case-based reasoning technique, (3) determine, for each level of understandability, the subsets of metrics that are important by means of a classification technique, and assess, by means of a probabilistic technique, the degree of participation of each metric in the understandability prediction. The results obtained show that although a linear model is a good approximation of the relation between structure and understandability, the associated coefficients are not significant enough. Additionally, classification analyses reveal respectively that prediction can be achieved by considering structure similarity, that extracted classification rules can be used to estimate the magnitude of understandability, and that some metrics such as the number of fact tables have more impact than others.

Full Text