High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

Dieter Hendricks,Diane Wilcox,Tim Gebbie

doi:10.17159/sajs.2016/20140340

Dieter Hendricks, Diane Wilcox + Show 1 more

Open Access

https://doi.org/10.17159/sajs.2016/20140340

Copy DOI

Journal: South African Journal of Science	Publication Date: Feb 1, 2016
Citations: 11	License type: CC BY 4.0

Abstract

We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.

Highlights

Advances in technology underpinning multiple domains have increased the capacity to generate and store data and metadata relating to domain processes
We introduce a maintainable and scalable master-slave parallel genetic algorithm (PGA) framework for unsupervised cluster analysis on the Compute Unified Device Architecture (CUDA) platform, which is able to detect clusters using the Giada and Marsili likelihood function
We have verified that the Giada and Marsili11 likelihood function is a viable, parallelisable approach for isolating residual clusters in data sets on a graphics processing units (GPUs) platform

Summary

Introduction

Advances in technology underpinning multiple domains have increased the capacity to generate and store data and metadata relating to domain processes. Giada and Marsili 11 propose an unsupervised, parameter-free approach to finding data clusters, based on the maximum likelihood principle. They derive a log-likelihood function, where a given cluster configuration can be assessed to determine whether it represents the inherent structure for the data set: cluster configurations which approach the maximum log-likelihood are better representatives of the data structure. We introduce a maintainable and scalable master-slave parallel genetic algorithm (PGA) framework for unsupervised cluster analysis on the CUDA platform, which is able to detect clusters using the Giada and Marsili likelihood function. Applying the proposed cluster analysis approach and examining the clustering behaviour of financial instruments, offers a unique perspective to monitor the intraday characteristics of the stock market and the detection of structural changes in near real time. The Pearson correlation is +1 in the case of a perfect positive linear relationship, -1 in the case of a perfect negative linear relationship and some value between -1 and +1 in all other cases, with values close to 0 signalling negligible interdependence

Clustering procedures

Results

Conclusion