SSD QoS Improvements through Machine Learning

Chandranil Chakraborttii,Heiner Litz,Vikas Sinha

doi:10.1145/3267809.3275453

Abstract

The recent deceleration of Moore's law bespeaks new approaches for optimization of resources. Machine learning has been applied to a wide variety of problems across multiple domains; however, the space of machine learning research for storage optimization is only lightly explored. In this paper, we focus on learning IO access patterns with the aim of improving the performance of flash based devices. Flash based storage devices provide orders of magnitude better performance than HDDs, but they suffer from high tail latencies due to garbage collection (GC) which causes variable IO latency. In flash devices, GC is the method of relocating existing data and deleting stale data, in order to create empty blocks for new incoming data. By learning the temporal trends of IO accesses, we built workload specific regression models for predicting the future time when the SSD will be in GC mode. We tested our models on synthetic traces (random read/write mix with fixed blocksize) generated by FIO workload generator. For the purpose of determining when the SSD is in GC mode, we track I/O completion times and classify completions that take more than 10 times the median completion value as representing those times when the SSD is in GC mode. Experiments run on the SSD models we tested reveal that a GC phase usually last 400 ms and it happens every 7000 ms on average. Results show that our workload specific models are accurate in predicting the time of next GC mode, achieving RMSE score of 10.61. The performance of flash devices can be further improved via efficient prefetching by learning IO access patterns. We use long short-term memory (LSTM) recurrent neural network (RNN) architecture to learn spatial patterns from block level I/O traces from SNIA, in order to predict the LBA to be requested ahead of time to be put in primary memory. Preliminary results show that the neural network based prefetchers are quite efficient in predicting the next requested LBA, achieving upto 82.5% accuracy. Our LSTM models are also very effective in predicting future IO operations (read/write) achieving high (91.6%) accuracy. We used a four layered neural network architecture with an LSTM layer containing 512 neurons and three other fully connected layers containing 256, 64 and 1000 neurons respectively. Time series models such as LSTM are very efficient in learning local temporal trends in data, which is useful in learning storage IO patterns. The work opens up a new direction towards using time series neural network model-based prefetching, and can be applied to a variety of problems in storage systems. Unsupervised machine learning techniques can be used to cluster the IO accesses and store offsets in different blocks based on access patterns. The strategy is to cluster the offsets and store data in different physical blocks based on the frequency of writes to those blocks. The separation of hot and cold data will minimize the write amplification associated GC and improve performance. Newly launched multi-stream SSD's provide a perfect opportunity to utilize the idea mentioned above to improve the quality of service.

Full Text