Abstract

In a very high-dimensional vector space, two randomly-chosen vectors are almost orthogonal with high probability. Starting from this observation, we develop a statistical factor model, the random factor model, in which factors are chosen stochastically based on the random projection method. Randomness of factors has the consequence that correlation and covariance matrices are well preserved in a linear factor representation. It also enables derivation of probabilistic bounds for the accuracy of the random factor representation of time-series, their cross-correlations and covariances. As an application, we analyze reproduction of time-series and their cross-correlation coefficients in the well-diversified Russell 3,000 equity index.

Highlights

  • 1.1 Vectors in a high-dimensional spaceIn a high-dimensional vector space, any two unit-length random vectors with random independent components are typically nearly orthogonal with respect to each other [1, 2]

  • In the analysis of five image data sets and five micro array data sets, principal component analysis (PCA) dominated with a small number of dimensions but its performance deteriorated when the dimensions of the data increased, while random projection dominates at high number of dimensions [33]

  • Despite the fact that PCA is worse than the random factor model (RFM) in reproducing correlation coefficients, PCA gives a better reproduction of the covariance matrix (Fig 4)

Read more

Summary

Vectors in a high-dimensional space

In a high-dimensional vector space, any two unit-length random vectors with random independent components are typically nearly orthogonal with respect to each other [1, 2]. It is possible to check whether the expected length squared and scalar products in the above random vector example are reached with a high probability. The maximal size of a collection of ε-quasiorthogonal vectors in Rd grows at least exponentially fast in d for any fixed ε > 0, as proven in [3] These observations have relevance to time-series analysis, since a long time-series corresponds to a vector in a high-dimensional vector space and orthogonality of vectors corresponds to uncorrelatedness of time-series. High-dimensionality of the data may even be an asset: in a high-dimensional space, almost any set of random vectors yields an almost uncorrelated set of factor time-series that can be used as a basis for a linear factor model

Factor models
Choice of factors
Correlation structure and random matrices
The random projection
This study
Notations
Linear factor models
Random projection
Random factor model
Principal component analysis
Comparison of factor models
Reproduction of time-series
Volatility
Correlation coefficient
Covariance
Specific volatility and explanatory power
Impact of the market factor
Universality
Probability distributions
Universality of distributions
Random factors as explanatory factors
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call