Abstract
Researchers often lack knowledge about how to deal with outliers when analyzing their data. Even more frequently, researchers do not pre-specify how they plan to manage outliers. In this paper we aim to improve research practices by outlining what you need to know about outliers. We start by providing a functional definition of outliers. We then lay down an appropriate nomenclature/classification of outliers. This nomenclature is used to understand what kinds of outliers can be encountered and serves as a guideline to make appropriate decisions regarding the conservation, deletion, or recoding of outliers. These decisions might impact the validity of statistical inferences as well as the reproducibility of our experiments. To be able to make informed decisions about outliers you first need proper detection tools. We remind readers why the most common outlier detection methods are problematic and recommend the use of the median absolute deviation to detect univariate outliers, and of the Mahalanobis-MCD distance to detect multivariate outliers. An R package was created that can be used to easily perform these detection tests. Finally, we promote the use of pre-registration to avoid flexibility in data analysis when handling outliers.
Highlights
In other words: (1) we suggest collecting enough data so that removing outliers is possible without compromising the statistical power; (2) if outliers are believed to be random, it is acceptable to leave them as they are; (3) if, for pragmatic reasons, researchers are forced to keep outliers that they detected as outliers influenced by moderators, the Winsorization or other transformations are acceptable in order to avoid the loss of power
To face situations not envisaged in the pre-registration or to deal with instances where sticking to pre-registration seems erroneous, we propose three other options: 1) Asking judges blind to the research hypotheses to make a decision on whether outliers that do not correspond to the a priori decision criteria should be included
In this paper, we stressed the importance of outliers in several ways: to detect error outliers; to gain theoretical insights by identifying new moderators that can cause outlying values; and to improve the robustness of the statistical analyses
Summary
How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. The first is attractive for its simplicity: ‘Data values that are unusually large or small compared to the other values of the same construct’ (Aguinis et al 2013: 275, Table 1) This definition only applies to single constructs; researchers should consider multivariate outliers (i.e., outliers because of a surprising pattern across several variables). In a previous paper, Leys et al (2018) highlight a situation where outliers can be considered as heuristic tools, allowing researchers to gain insights regarding the processes under examination (see McGuire, 1997): ‘Consider a person who would exhibit a very high level of in-group identification but a very low level of prejudice towards a specific outgroup This would count as an outlier under the theory that group identification leads to prejudice towards relevant out-groups. The slope of the regression line can be computed as follows:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.