Abstract

With the rise of the Big Data paradigm new tasks for prediction models appeared. In addition to the volume problem of such data sets nonlinearity becomes important, as the more detailed data sets contain also more comprehensive information, e.g. about non regular seasonal or cyclical movements as well as jumps in time series. This essay compares two nonlinear methods for predicting a high frequency time series, the USD/Euro exchange rate. The first method investigated is Autoregressive Neural Network Processes (ARNN), a neural network based nonlinear extension of classical autoregressive process models from time series analysis (see Dietz 2011). Its advantage is its simple but scalable time series process model architecture, which is able to include all kinds of nonlinearities based on the universal approximation theorem of Hornik, Stinchcombe and White 1989 and the extensions of Hornik 1993. However, restrictions related to the numeric estimation procedures limit the flexibility of the model. The alternative is a Support Vector Machine Model (SVM, Vapnik 1995). The two methods compared have different approaches of error minimization (Empirical error minimization at the ARNN vs. structural error minimization at the SVM). Our new finding is, that time series data classified as “Big Data” need new methods for prediction. Estimation and prediction was performed using the statistical programming language R. Besides prediction results we will also discuss the impact of Big Data on data preparation and model validation steps. Normal 0 21 false false false DE X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:Normale Tabelle; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:Times New Roman,serif;}

Highlights

  • The origins of the expression “Big Data” traces back to the turn of the millennium

  • As we deal with high frequency data here and we are interested in long term predictions, we choose 25.000 of the total data to be the estimation subset and the 6.986 observations to be the validation subset

  • The procedure is as follows: The model is estimated once, a prediction is made for one hour ahead, the values used for prediction are updated with the new observations and again prediction is made for one hour ahead

Read more

Summary

Introduction

The origins of the expression “Big Data” traces back to the turn of the millennium It was first mentioned in the context of data mining (see Weiss and Indurkhya 1998) and econometrics (see Diebold 2000). The term describes a paradigm shift in data models which can be explained in the following: Increasing volume, variety and velocity (the 3 V’s , mentioned in an unpublished research note at META Group from 2001) of data leads on the one hand to increasing missing or corrupted values, on the other hand to noisier data streams To counteract those tasks new data preprocessing and data analysis methods are necessary. One of their basic features should be to include nonlinearity which comes along with more frequent/noisier data

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.