Multiple change-point detection with a genetic algorithm

A Jann

doi:10.1007/s005000000049

Abstract

A common change-point problem is considered where the population mean of a random variable is suspected of undergoing abrupt changes in course of a time series. It is usual in practice that no information on positions or number of such shifts is available beforehand. Finding the change points, i.e. the positions of the shifts, in such a situation is a delicate statistical problem since any considered sample may actually represent a mixture of two or more populations where values from both sides of a yet unrecognized change point are unconsciously assembled. If this is the case, underlying assumptions of an employed statistical two-sample test are usually violated. Consequently, no definite decision should be based on just one value of the test statistic. Such a value is rather, as a precaution, to be regarded as an only approximate indicator of the quality of a hypothesis about change-point positions. Given these conclusions, it is found imperative to treat the problem of multiple change-point detection as one of global optimization. A cost function is constructed in such a manner that the change-point configuration yielding the global optimum is compliant with statistical-theoretical requirements to the utmost extent. The used advanced optimization tool, a genetic algorithm, is both efficient – as it takes advantage of the information about promising change-point positions encountered in previously investigated trial configurations – and flexible (as it is open to any modification of the change-point configuration at any time). Experiments using numerical simulation confirm adequate performance of the method in an application where a common change-point detection procedure based on Student's two-sample t-test is used to detect an arbitrary number of shifts in the mean of a normally distributed random variable.

Full Text