Abstract

Chi-squared test is a standard statistical test to ascertain independence between categorical variables. So, it is recommended to do the test for the attributes in the datasets, and remove any redundant attributes before we supply the datasets to machine learning algorithms. But, if we have many attributes that are common in real-world datasets, it is not easy to choose two attributes to do the independence test. On the other hand, several automated algorithms to find functional dependencies based on data have been suggested. Because functional dependencies show many-to-one relationships between values of attributes, we could conjecture that there might be statistical dependence in the found functional dependencies. For us to overcome the problem of choosing appropriate attributes for statistical dependency tests, we may use some algorithms for automated functional dependency finding. We want to confirm that the found functional dependencies can show statistical dependence between attributes in real-world datasets. Experiments were performed for three different real-world datasets using SPSS to confirm the statistical dependence of functional dependencies that are found by an open-source tool called FDtool, where we can use FDtool for automated functional dependency discovery. The experiments confirmed that there exists statistical dependence in the found functional dependencies and showed improvements in decision trees after removing dependent attributes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.