Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

Zhao Zhang,Xiaodong Zhang

doi:10.1137/s1064827599359709

Abstract

In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations of bit-reversals. We evaluate the merits and limits of each technique and its application and architecture-dependent conditions for developing cache-optimal methods. Besides testing the methods on different uniprocessors, we conducted both simulation and measurements on two commercial symmetric multiprocessors (SMP) to provide architectural insights into the methods and their implementations. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and translation-lookaside buffer (TLB) cache size and which fully use the available registers, are cache-optimal and fast. (2) We show that our padding methods outperform other software-oriented methods, and we believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and multiprocessors.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

Abstract

Talk to us

Similar Papers

More From: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics

Lead the way for us

Journal: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics	Publication Date: Jan 1, 2001
Citations: 27

Similar Papers

Cache-optimal methods for bit-reversals
Zhao Zhang ... Xiaodong Zhang
-
Zhao Zhang, et. al.Zhao Zhang ... Xiaodong Zhang
01 Jan 1998
01 Jan 1998

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory
Carlos Villavieja ... Adrian Cristal
-
Carlos Villavieja, et. al.Carlos Villavieja ... Adrian Cristal
01 Oct 2011
01 Oct 2011

System architectures based on functionality offloading
...
-
, et. al. ...
01 Jan 2008
01 Jan 2008

Performance analysis of re-configurable partitioned TLBs
D Channon ... D Koch
-
D Channon, et. al.D Channon ... D Koch
03 Jan 1997
03 Jan 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

Abstract

Talk to us

Similar Papers

More From: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics