Abstract

Data pre-processing tends to be the most critical and time-consuming step during data mining processes. Understanding the inter dependencies among the attributes is especially important for attribute selection and model structure design. Correlation measures, such as Pearson correlation coefficient, have been typically used to measure attribute dependencies. Correlation is useful for capturing linear dependency among quantitative attributes, and is invariant under linear transformations of the variables only. More recently, mutual information has been used to measure interdependencies among attributes measured in continuous scale. Mutual information is applicable to quantitative and categorical variables, captures any type of functional dependency between variables, and is invariant under one-to-one transformations. In this paper, we employ mutual information as a unified measure of interdependencies among attributes, by extending it to accommodate attributes measured in continuous and categorical scales. We further visualize the attribute interdependencies using a host of techniques, including hierarchical clustering, multidimensional scaling, and self-organizing maps. The use of mutual information permits identification of some salient interdependencies between attributes. We demonstrate the utility of the proposed methodology using real data mining applications

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.