Abstract

Order execution cost analysis is one of the most important problems in financial investments. Many previous research works model the problem as a cost classification or regression task. However, due to insufficient real orders, performances of those models are not satisfying. Moreover, unlimited simulated orders generated by market simulators are not exploited by the analysis approach. In this paper, we propose an order execution cost estimation approach by using limited real orders and unlimited simulated orders. The approach 1) employs exploratory data analysis to explore the patterns and relationships included in the raw data, and selects the appropriate features for model training, 2) trains supervised models on labeled orders as baselines to estimate order execution cost, 3) trains three Semisupervised Learning (SSL) models on both labeled and simulated orders to improve the estimation performances, where a. Semisupervised Support Vector Machine (S3VM) makes a low-density separation on labeled and unlabeled orders, b. Tri-Training performs bootstrap sampling on the labeled orders to obtain three labeled training sets to make disagreement for labeling unlabeled orders, and c. Label Propagation (LP) model propagates the order execution cost labels of the labeled orders to the unlabeled ones on a graph and adjusts the labels based on local and global consistency. Experiments are conducted on real and simulated order datasets. Results of the experiments show that the SSL models perform better than the baselines, where S3VM optimized by Adam, Random Forest (RF) based Tri-Training and Radial Basic Function (RBF) based LP can make use of the information of unlabeled orders to tremendously improve classification performances in F1 score.

Highlights

  • With the development of Fintech technology, machine learning models have been used by more and more investment institutions to guide their investments and cost analysis, and one of the most important tasks is to analyze order execution cost of their trading orders which are executed through established trading models.The order execution cost analysis is usually formulated as a classification or regression task in a machine learning workflow, and there have been many research works that focus on the estimation of orders’ slippage1 [1], [2]

  • The weighted average F1-score of all classes is 0.67, which indicates the S3VM optimized by Adam can correctly discriminate most order execution cost. The results show it outperforms the S3VM optimized by Stochastic Gradient Decent (SGD), i.e., adaptive optimizer shows better ability in estimating order execution cost

  • For the dilemmas faced by investment institutions in estimating the order execution cost of orders, we build SSL models based on a large number of unlabeled simulated orders and a small number of labeled real orders

Read more

Summary

INTRODUCTION

With the development of Fintech technology, machine learning models have been used by more and more investment institutions to guide their investments and cost analysis, and one of the most important tasks is to analyze order execution cost of their trading orders which are executed through established trading models. On the other hand, generating orders data by researchers in a real market is expensive, since it needs a large amount of money to execute real trading orders. Both reasons lead to a small dataset for analysis, and models trained on the small dataset will have poor estimation performances. In order to make full use of both real and simulated data, we propose an order execution analysis approach based on SSL.

RELATED WORKS
TRI-TRAINING
LABEL PROPAGATION
ORDER DATASETS
EXECUTION COST LABELING
EXPERIMENTS
SSL MODEL SETTING
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.