Abstract

Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: (1) The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. (2) The partial products, to be summed in producing an inner product, are reordered according to their “minimum alignments.” This reordering brings approximately a 20% saving in hardware—including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. (3) For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b − 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call