Abstract

"Data analysis” instead of “statistics” is a name that allows us to use probability where it is needed and avoid it when we should. Data analysis has to analyze real data. Most real data calls for data investigation, while almost all statistical theory is concerned with data processing. This can be borne, in part because large segments of data investigation are, by themselves, data processing. Summarizing a batch of 20 numbers is a convenient paradigm for more complex aims in data analysis. A particular summary, highly competitive among those known and known about in August 1971, is a hybrid between two moderately complex summaries. Data investigation comes in three stages: exploratory data analysis (no probability), rough confirmatory data analysis (sign test procedures and the like), mustering and borrowing strength (the best of modern robust techniques, and an art of knowing when to stop). Exploratory data analysis can be improved by being made more resistant, either with medians or with fancier summaries. Rough confirmatory data analysis can be improved by facing up to the issues surrounding the choice of what is to be confirmed or disaffirmed. Borrowing strength is imbedded in our classical procedures, though we often forget this. Mustering strength calls for the best in robust summaries we can supply. The sampling behavior of such a summary as the hybrid mentioned above is not going to be learned through the mathematics of certainty, at least as we know it today, especially if we are realistic about the diversity of non-Gaussian situations that are studied. The mathematics of simulation, inevitably involving the mathematically sound “swindles” of Monte Carlo, will be our trust and reliance. I illustrate results for a few summaries, including the hybrid mentioned above. Bayesian techniques are still a problem to the author, mainly because there seems to be no agreement on what their essence is. From my own point of view, some statements of their essence are wholly acceptable and others are equally unacceptable. The use of exogeneous information in analyzing a given body of data is a very different thing (a) depending on sample size and (b) depending on just how the exogeneous information is used. It would be a very fine thing if the questions that practical data analysis has to have answered could be answered by the mathematics of certainty. For my own part, I see no escape, for the next decade or so at least, from a dependence on the mathematics of simulation, in which we should heed von Neumann’s aphorism as much as we can.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call