Abstract

Relationships between primary and secondary data are frequently quantified using the correlation coefficient; however, traditional means of calculating experimental correlation coefficients are known to be adversely affected by outlier data. A new method for calculating a robust correlation coefficient is proposed based on a weighted average correlation calculated from different combinations or subsets of the original data. The proposed robust correlation coefficient is shown to have a higher breakdown point than either Pearson's or Spearman's correlation coefficients as well as two out of three other robust correlation coefficients. The least median of squares (LMS) correlation coefficient has the highest possible breakdown point; however, it also tends to give unrealistically high or low correlation coefficients. A simulation study demonstrates the differences between the proposed robust correlation coefficient and other robust correlation coefficients. When the sample size is small, the uncertainty in the measured correlation can be very large, especially when the measured correlation is low. The uncertainty in the correlation coefficient is calculated based on the measured correlation and the number of data. This sampling distribution for the correlation coefficient requires a number of independent data; however, earth sciences data are often spatially dependent. Thus, a method for calculating an effective number of independent data using the variogram is proposed. An example is presented that applies the developed techniques to a petroleum geostatistics problem. The methodologies presented in this paper are implemented in FORTRAN code made available as part of this paper.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call