Abstract
The so-called learned sorting, which was first proposed by Google, achieves data sorting by predicting the placement positions of unsorted data elements in a sorted sequence based on machine learning models. Learned sorting pioneers a new generation of sorting algorithms and shows a great potential because of a theoretical time complexity ON and easy access to hardware-driven accelerating approaches. However, learned sorting has two problems: controlling the monotonicity and boundedness of the predicted placement positions and dealing with placement conflicts of repetitive elements. In this paper, a new learned sorting algorithm named LS is proposed. We integrate a back propagation neural network with the technique of look-up-table in LS to guarantee the monotonicity and boundedness of the predicted placement positions. We design a data structure called the self-regulating index in LS to tentatively store and duly update placement positions for eliminating potential placement conflicts. Results of three controlled experiments demonstrate that LS can effectively control the monotonicity and boundedness, achieve a better time consumption than quick sort and Google’s learned sorting, and present an excellent stability when the data size or the number of repetitive elements increases.
Highlights
Sorting is a fundamental data operation that obtains a sorted data sequence from an unsorted dataset increasingly or decreasingly
We find that learned sorting has two problems: (1) constructing the cumulative distribution function (CDF) of an unsorted dataset by using a machine learning model for placement position prediction is crucial in the first step of learned sorting
We introduce the technique of LUT into back propagation (BP) neural network for placement position prediction
Summary
Sorting is a fundamental data operation that obtains a sorted data sequence from an unsorted dataset increasingly or decreasingly It is a crucial part of various advanced algorithms and systems [1, 2]. E first category is hardware-driven accelerating approaches that reduce sorting time by utilizing multicore CPU/GPUs or computer clusters to conduct parallel sorting [5,6,7,8]. Such approaches require massive computing resources and do not reduce the time complexity of sorting. Approaches of the second category, which is called non-comparison sorting, distribute input data in intermediate data structures to gather and place them in a sorted sequence [9,10,11,12]. e time complexity of non-comparison sorting, such as self-indexed sort and Qureshi sort, is slightly superior to that of quick sort, while their memory space consumptions are fairly large and suitable data types are limited [11, 12]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have