Abstract

The increasing popularity of flash memory means more database systems will run on flash memory in the future. One of the most important database operations is the external sort. Hence, this paper is focused on studying the problem of efficient external sorting on flash memory. In contrast to most previous work, we target the situation where previously sorted data have become progressively unsorted due to data updates. Accordingly, we call this ‘partially’ sorted data. We focus on re-sorting partially sorted data by taking advantage of the partial sorted nature of the data to speed up the run generation phase of the traditional external merge sort. We do this by finding ‘naturally occurring’ page runs in the partially sorted data. Our algorithm can perform up to a factor of 1024 less write IO compared with a traditional external merge sort during the run generation phase. We map the problem of finding naturally occurring runs into the shortest distance problem in a directed acyclic graph (DAG). Accordingly, we propose an optimal solution to the problem using the well-known DAG-Shortest-Paths algorithm. However, we found that the optimal solution was too slow for even moderate-sized data sets and accordingly propose a fast heuristic solution that—we experimentally show—finds a high percentage of page runs using a minimum of computational overhead. Experiments using both real and synthetic data sets show that our heuristic algorithm can halve the external sorting time when compared with three likely competing external sorting algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call