Abstract
Software-based, thread-level speculation (TLS) is a software technique that optimistically executes in parallel loops whose fully-parallel semantics can not be guaranteed at compile time. Modern TLS libraries allow to handle arbitrary data structures speculatively. This desired feature comes at the high cost of local store and/or remote recovery times: The easier the local store, the harder the remote recovery. Unfortunately, both times are on the critical path of any TLS system. In this paper we propose a solution that performs local store in constant time, while recover values in a time that is in the order of $$T$$T, being $$T$$T the number of threads. As we will see, this solution, together with some additional improvements, makes the difference between slowdowns and noticeable speedups in the speculative parallelization of non-synthetic, pointer-based applications on a real system. Our experimental results show a gain of 3.58$$\times $$× to 28$$\times $$× with respect to the baseline system, and a relative efficiency of up to, on average, 65 % with respect to a TLS implementation specifically tailored to the benchmarks used.
Highlights
Many problems that arise in real-world networks imply the computation of the shortest paths, and their distances, from a source to any destination point
This section describes the results of our experiments and the performance comparisons for the scenarios and input sets described in the previous section
In order to clarify the figure, we have only shown the results of one graphic processing units (GPUs) board, in this case Titan, because the execution times for the remaining platforms present the same trends related to size and degree
Summary
Many problems that arise in real-world networks imply the computation of the shortest paths, and their distances, from a source to any destination point. Algorithms to solve shortest-path problems are computationally costly. The Single-Source Shortest Path (SSSP) problem is a classical problem of optimization. The classical algorithm that solves the SSSP problem is Dijkstra’s algorithm [4]. Where n = |V | and m = |E|, the complexity time of this algorithm is O(n2) This complexity is reduced to O(m + n log n) when special data structures are used, as with the implementation of Dijkstra’s algorithm included in the Boost Graph Library [5] which exploits the relaxed-heap structures. The efficiency of Dijkstra’s algorithm is based on the ordering of previously computed results. This feature makes its parallelization a difficult task. Under certain situations, this ordering can be permuted without leading to wrong results or performance losses
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.