Abstract

As a basic algorithm for big data processing, external sorting suffers from massive read and write operations in the external memory. Recent works separate part of the data processing work from the host side to the solid state drive (SSD) to reduce data transmission. However, the internal memory of the SSD is limited, and undesirable data retention could occur during the merge phase. Therefore, to improve the efficiency of memory, we propose an algorithm named ISort. Specifically, we build an index table between the memory and the address. The index table determines the order of pages being read in the merge phase according to their minimum values, which are read into memory sequentially to reduce the data residing in memory and improve memory efficiency. Since the merge phase is performed inside the SSD, ISort can take advantage of the high IO bandwidth within the SSD to speed up the execution of the merge phase. We search for the optimal ratio of read and write channels by comparing the “specialized channel” and the “hybrid channel” for data of read and write performance because the utilization of the channel will directly influence performance. Experimental results show that ISort can maintain better data processing speed when SSD memory is limited, outperforming other robust algorithms. In addition, the algorithm’s performance using the crossover strategy is better than that using the specialization strategy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call