Reversals of statistical relationships, when two or more groups of data in a cross tabulation are aggregated, were first revealed more than a century ago. The reversal was later named Simpson’s paradox after his reversal examples in a seminal paper drew the attention of the statistical community. However, almost all the published cases have been in sociology and biomedical statistics. Does Simpson’s reversal occur in geosciences? Various examples from petroleum geology and reservoir modeling will be shown in this paper. Boundary conditions for such a reversal will be discussed under a broader framework of sampling analysis. Ecological inference bias, change of support problem, modifiable areal unit problem, and reference class problem will be discussed in relation to the Simpson’s paradox in the framework of spatial statistics. It will be demonstrated that the traditional interpretation of the paradox as a result of disproportional sampling based on a contingency table is not always true in the framework of spatial statistics, and the reversal while theoretically benign is inferentially treacherous. Therefore, emphasis will be on the discussion of combining statistical and scientific inferences in geologic modeling and hydrocarbon resource evaluation under various sampling schemes or support effect with or without a Simpson’s reversal.
Read full abstract