Abstract

The sample correlation coefficient R is almost universally used to estimate the population correlation coefficient ρ. If the pair (X,Y) has a bivariate normal distribution, this would not cause any trouble. However, if the marginals are nonnormal, particularly if they have high skewness and kurtosis, the estimated value from a sample may be quite different from the population correlation coefficient ρ.The bivariate lognormal is chosen as our case study for this robustness study. Two approaches are used: (i) by simulation and (ii) numerical computations.Our simulation analysis indicates that for the bivariate lognormal, the bias in estimating ρ can be very large if ρ≠0, and it can be substantially reduced only after a large number (three to four million) of observations. This phenomenon, though unexpected at first, was found to be consistent to our findings by our numerical analysis.

Highlights

  • The Pearson product-moment correlation coefficient p is a measure of linear dependence between a pair of random variables (X,Y)

  • While the properties of R for the bivariate normal are clearly understood, the same cannot be said about nonnormal bivariate populations

  • Various specific nonnormal populations have been investigated, the messages on the robustness of R are conflicting Johnson et al (1995, pp.580) remarked that "Contradictory, confusing, and uncoordinated floods of information on the ’robustness’ properties of the sample correlation coefficient R are scattered in dozens of journals." We do not intend to enter into the fray

Read more

Summary

Introduction

The Pearson product-moment correlation coefficient p is a measure of linear dependence between a pair of random variables (X,Y). The sample (product-moment) correlation coefficient R, derived from n observations of the pair (X, Y), is normally used to estimate p. The size of the bias and the variance of R are still rather hazy for general bivariate nonnormal populations when p 0, since. Various specific nonnormal populations have been investigated, the messages on the robustness of R are conflicting Johnson et al (1995, pp.580) remarked that "Contradictory, confusing, and uncoordinated floods of information on the ’robustness’ properties of the sample correlation coefficient R are scattered in dozens of journals." We do not intend to enter into the fray. We compute the lower cross-cumulant ratios that contribute to the terms in n-1 and n-2 in the mean and variance of R The magnitude of these values cause difficulties in estimating the sizes of the bias and variance. They do shed some light on where the difficulties lie

Sample ’Correlation of the Bivariate Lognormal Distribution
Simulation Study
Million 100
Asymptotic Expansions
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call