New Dataset for Forecasting Realized Volatility: Is the Tokyo Stock Exchange Co-Location Dataset Helpful for Expansion of the Heterogeneous Autoregressive Model in the Japanese Stock Market?

Shigeyuki Hamori,Takuji Kinkyo,Katsuyuki Tanaka,Takuo Higashide

doi:10.3390/jrfm14050215

Shigeyuki Hamori, Takuji Kinkyo + Show 2 more

Open Access

https://doi.org/10.3390/jrfm14050215

Copy DOI

Abstract

This study analyzes the importance of the Tokyo Stock Exchange Co-Location dataset (TSE Co-Location dataset) to forecast the realized volatility (RV) of Tokyo stock price index futures. The heterogeneous autoregressive (HAR) model is a popular linear regression model used to forecast RV. This study expands the HAR model using the TSE Co-Location dataset, stock full-board dataset and market volume dataset based on the random forest method, which is a popular machine learning algorithm and a nonlinear model. The TSE Co-Location dataset is a new dataset. This is the only information that shows the transaction status of high-frequency traders. In contrast, the stock full-board dataset shows the status of buying and selling dominance. The market volume dataset is used as a proxy for liquidity and is recognized as important information in finance. To the best of our knowledge, this study is the first to use the TSE co-location dataset. The experimental results show that our model yields a higher forecast out-of-sample accuracy of RV than the HAR model. Moreover, we find that the TSE Co-Location dataset has become more important in recent years, along with the increasing importance of high-frequency trading.

Highlights

Forecasting volatility is important for financial risk management
This study suggests a new approach for realized volatility (RV) forecasts of Tokyo stock price index (TOPIX) futures
The characteristic of our model is that it uses the heterogeneous autoregressive (HAR) dataset and the TSE Co-Location dataset and stock full-board dataset, both of which are related to high-frequency trading (HFT) and the market volume dataset based on the random forest method

Summary

Introduction

Forecasting volatility is important for financial risk management. Volatility is considered a daily varying random variable that represents the uncertainty of returns on assets. There are many previous studies of time-series modeling for volatility forecasting (Engle 1982; Taylor 1982; Bollerslev 1986; Nelson 1991; Glosten et al 1993; Ding et al 1993; Baillie et al 1996; Harvey 1998). Luong and Dokuchaev (2018) introduced a nonlinear model using the random forest method, which is a well-known machine learning method introduced by Breiman (2001) They apply the random forest method for forecasting the direction (“up” or “down”) of RV in a binary classification problem framework using a technical indicator of RV

Methods

Results

Conclusion