Abstract

Software-based, thread-level speculation (TLS) is a software technique that optimistically executes in parallel loops whose fully-parallel semantics can not be guaranteed at compile time. Modern TLS libraries allow to handle arbitrary data structures speculatively. This desired feature comes at the high cost of local store and/or remote recovery times: The easier the local store, the harder the remote recovery. Unfortunately, both times are on the critical path of any TLS system. In this paper we propose a solution that performs local store in constant time, while recover values in a time that is in the order of $$T$$T, being $$T$$T the number of threads. As we will see, this solution, together with some additional improvements, makes the difference between slowdowns and noticeable speedups in the speculative parallelization of non-synthetic, pointer-based applications on a real system. Our experimental results show a gain of 3.58$$\times $$× to 28$$\times $$× with respect to the baseline system, and a relative efficiency of up to, on average, 65 % with respect to a TLS implementation specifically tailored to the benchmarks used.

Highlights

  • Many problems that arise in real-world networks imply the computation of the shortest paths, and their distances, from a source to any destination point

  • This section describes the results of our experiments and the performance comparisons for the scenarios and input sets described in the previous section

  • In order to clarify the figure, we have only shown the results of one graphic processing units (GPUs) board, in this case Titan, because the execution times for the remaining platforms present the same trends related to size and degree

Read more

Summary

Introduction

Many problems that arise in real-world networks imply the computation of the shortest paths, and their distances, from a source to any destination point. Algorithms to solve shortest-path problems are computationally costly. The Single-Source Shortest Path (SSSP) problem is a classical problem of optimization. The classical algorithm that solves the SSSP problem is Dijkstra’s algorithm [4]. Where n = |V | and m = |E|, the complexity time of this algorithm is O(n2) This complexity is reduced to O(m + n log n) when special data structures are used, as with the implementation of Dijkstra’s algorithm included in the Boost Graph Library [5] which exploits the relaxed-heap structures. The efficiency of Dijkstra’s algorithm is based on the ordering of previously computed results. This feature makes its parallelization a difficult task. Under certain situations, this ordering can be permuted without leading to wrong results or performance losses

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.