An outlier is an unexpected event or entity. For example, many experts view the Great Financial Crash of 2008 as an outlier event which has triggered a reappraisal of mainstream economic and financial models which define the “expected.” The objective of Outlier Detection in Data Mining is in similar vein—outliers often embody new information, which is often hard to explain in the context of existing knowledge and results in a re-evaluation of what is known. An important theme in Statistics/Machine Learning/Data Mining is the use of data to study the “norm” or “expected” behaviour of the underlying phenomenon (which generated the data). The presence of outliers often distorts the understanding of the norm and has given rise to a set of techniques, often called robust statistics, which discount the effect of outliers. A canonical example is the use of the median (which is less sensitive to outliers) as opposed to the mean (which is extremely sensitive to outliers) to characterize average behaviour. In Data Mining, an outlier is a primary object of study which can potentially lead to the discovery of new “knowledge.” Thus the emphasis has been on the design of algorithms to find outliers in complex scenarios while relaxing as many assumptions on the underlying data generating model as possible. Or more simply, the focus in Data Mining is on non-parametric (and semi-parametric) outlier detection techniques. Here is a simple example: Let D be a multi-variate data set and the objective it to discover whether there are any outliers in D. If we assume that Dwas generated from amulti-dimensional Normal distribution then it is well known that theMahalanobis distance from data points to the