Using Excel to Explore the Effects of Assumption Violations on One-Way Analysis of Variance (ANOVA) Statistical Procedures

Ivan Kelly,William Laverty

doi:10.4236/ojs.2019.94031

Abstract

To understand any statistical tool requires not only an understanding of the relevant computational procedures but also an awareness of the assumptions upon which the procedures are based, and the effects of violations of these assumptions. In our earlier articles (Laverty, Miket, & Kelly [1]) and (Laverty & Kelly, [2] [3]) we used Microsoft Excel to simulate both a Hidden Markov model and heteroskedastic models showing different realizations of these models and the performance of the techniques for identifying the underlying hidden states using simulated data. The advantage of using Excel is that the simulations are regenerated when the spreadsheet is recalculated allowing the user to observe the performance of the statistical technique under different realizations of the data. In this article we will show how to use Excel to generate data from a one-way ANOVA (Analysis of Variance) model and how the statistical methods behave both when the fundamental assumptions of the model hold and when these assumptions are violated. The purpose of this article is to provide tools for individuals to gain an intuitive understanding of these violations using this readily available program.

Highlights

An important aspect of any statistical procedure is the assumptions that the procedure is based on
The advantage of using Excel is that the simulations are regenerated when the spreadsheet is recalculated allowing the user to observe the performance of the statistical technique under different realizations of the data
The purpose of this article is to provide tools for individuals to gain an intuitive understanding of these violations using this readily available program

Summary

Introduction

An important aspect of any statistical procedure is the assumptions that the procedure is based on. If the population is non-normal but has a finite mean and variance (such that the Law of Large Numbers and the Central Limit theorem applies), the departure from normality will have little effect on the properties of confidence intervals computed assuming normality when the sample size is adequately large. If the measurements were measurements of blood pressure, IQ, performance of a political leader one may expect the presence of extreme measurements In such cases an appropriate model of the departures from the central value would be the t-distribution (a heavy tailed distribution). Measurements of blood pressure, IQ, and performance of a political leader, could result in non-normal data with extreme values at either end In such cases alternatives to ANOVA are appropriate..

Simulation of Data from a Continuous Distribution in Excel

Setting Up the Excel Worksheet to Simulate Anova Data

Generating Simulated Data

Computation of Statistics Required for One-Way ANOVA

Findings

Discussion