Prediction in financial markets

Vasant Dhar

doi:10.1145/1961189.1961191

Abstract

Predictive models in regression and classification problems typically have a single model that covers most, if not all, cases in the data. At the opposite end of the spectrum is a collection of models, each of which covers a very small subset of the decision space. These are referred to as “small disjuncts.” The trade-offs between the two types of models have been well documented. Single models, especially linear ones, are easy to interpret and explain. In contrast, small disjuncts do not provides as clean or as simple an interpretation of the data, and have been shown by several researchers to be responsible for a disproportionately large number of errors when applied to out-of-sample data. This research provides a counterpoint, demonstrating that a portfolio of “simple” small disjuncts provides a credible model for financial market prediction, a problem with a high degree of noise. A related novel contribution of this article is a simple method for measuring the “yield” of a learning system, which is the percentage of in-sample performance that the learned model can be expected to realize on out-of-sample data. Curiously, such a measure is missing from the literature on regression learning algorithms. Pragmatically, the results suggest that for problems characterized by a high degree of noise and lack of a stable knowledge base it makes sense to reconstruct the portfolio of small rules periodically.

Full Text