Abstract

Statistical analysis of dependencies existing in data sets is now one of the most important applications of statistics. It is also a core part of data mining - a rapidly developing in recent years part of information technology. Statistical methods that have been proposed for the analysis of dependencies in data sets can be roughly divided into two groups: tests of statistical independence and statistical measures of the strength of dependence. Numerous statistical test of independence have been developed during the last one hundred (or even more) years. They have been developed for many parametric (like the test of independence for normally distributed data, based on the Pearson coefficient of correlation ρ) and non-parametric (like the test of independence based on the Spearman rank correlation statistic ρS) models. The relative ease of developing such tests stems from the fact that statistical independence is a very peculiar feature of data sets. In the case of independence, probability distributions that describe multivariate statistical data depend exclusively on the marginal probability distributions of separate components of vectors of random variables. This feature can exist unconditionally (as it is usually assumed in statistical analysis) or conditionally (when a value of a certain latent variable that influences the random variables of interest can be regarded as fixed for the analyzed data set). Despite the fact that independence can be rather frequently observed in carefully performed statistical experiments we are of the opinion that in case of real large data sets a perfect statistical independence exists rather seldom. On the other hand, however, the acceptance of the assumption of independence is sometimes necessary for, e.g., computational reasons. Therefore, there is often a practical need to soften the independence requirements by defining the state of “near-independence”. The question arises then, how to evaluate this state using statistical data. The concept of “near-independence” is definitely a vague one. In contrast to the case of independence, that is very precisely defined in terms of the theory of probability, it seems to be fundamentally impossible to define one measure of the strength

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.