Abstract

There has been a lot of research work to find out functional dependencies algorithmically from databases. But, when the databases consist of numerical attributes, some of the found functional dependencies might not be real functional dependencies, because numerical attributes can have a variety of values. On the other hand, regression analysis is an analysis method in which a model of the observed continuous or numerical variables is obtained and the degree of fit is measured. In this paper, we show how we can determine whether the found functional dependencies of numerical attributes have explanatory power by doing multivariate linear regression tests. We can check their explanatory power by way of adjusted R-squared, as well as other statistics like multicollinearity, the Durbin-Watson test for independence, and the F value for suitability of the regression models. For the experiment, we used the wine quality data set of Vinho Verde in the UCI machine learning library, and we found out that only 48.7% and 30.7% of functional dependencies found by the algorithm called FDtool have explanatory power for the red wine and white wine data set respectively. So, we can conclude that we should be careful when we want to apply the functional dependencies found by the algorithm. In addition, as a possible application of the found functional dependencies in the conditional attributes of the data sets, we have generated a series of random forests by dropping redundant attributes that appear on the right-hand side of the explanatory functional dependencies and acquired good results. So, we can also conclude that we may reduce our efforts by not collecting the data of the redundant attribute to check the wine quality because we can use samples with as few attribute values as possible in mass-produced wines like Vinho Verde.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call