Abstract
Data Warehouse provides the foundation for businesses to take informed decisions for day to day operations and making future strategy. Since the role is so pivotal to the growth and success of the business, its quality is very critical. Conceptual models of data warehouses give us a great insight into the quality of the developed system during the early stages of the design process. Researchers have proposed a number of metrics to evaluate the quality of these object oriented multidimensional models. Further, for these metrics to be used in practice, empirical evaluation is crucial. There are a number of propositions in literature that work towards empirical validation of metrics. But most of them are either restricted to statistical techniques or supervised machine learning techniques. In order to empirically validate the metrics, we need to get user responses for a number of schemas and take down observations to quantify model quality aspects like understandability, efficiency etc. This can result in personal biases, errors and random outliers which impacts the evaluation model. In this paper, we have made a first attempt to assess the relationship between the object oriented multidimensional data warehouse structural metrics and understandability of its models by using unsupervised machine learning techniques with the aid of a data warehouse quality expert. The results indicate that the proposed metrics have a strong relationship with understandability and inturn quality of the data warehouse conceptual models and the unsupervised techniques are able to identify this relationship with high degree of accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have