Abstract
This chapter focuses on the process of developing data cleaning algorithms. Virtually no data is perfect and financial engineers spend large amounts of time cleaning errors and resolving issues in data sets, sometimes called preprocessing the data. Failing to adequately consider the impact of bad data can lead to bad models or worse systems that pass the backtest stage but lose millions in actual trading. Serious data cleaning involves more than just visually scanning data and updating bad records with better data. In this step, the product team investigates alternative data cleaning methods to arrive at benchmark processes, the ones that generate the best performance. Benchmarking data cleaning processes focuses the product team on improving the performance of the trading/investment system. The cleaning of data removes variation from the process, so that the team can get a good clean look at the process, the common variation and the potential of the system. First of all, the developers has to identify and categorize all the types of problems they expect to encounter in their data, then they have to survey the available techniques to address those different types of errors, and then finally develop methods to identify and resolve the problems. For this, the trading firms need to have senior-level people to design Data Transformation Management Systems. These tools should be automated, to the extent possible so as to allow junior-level financial engineers to investigate errors and outliers graphically, by using scatterplots, SPC charts, and histograms. All the cleaning algorithms should be documented and benchmarked and the methods for calculating national best bid and offer prices should be standardized.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have