Abstract
Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.