Diminished-1 representation of modulo-(2q+1) residues provides for delay-balanced adders for the popular moduli set τ=2q+2q±1. Besides, conjugacy of moduli 2q±1 leads to efficient multiple-residue to binary conversion. Nevertheless, cardinality of τ does not cover the required dynamic range of some applications that also require small q-values for gaining the desired arithmetic speed (e.g., convolutional neural networks). The recently proposed parallel prefix modulo-(2q+2q-1-1) addition scheme provides for balanced performance with similar adder architectures for the moduli of doubly-wide τ (i.e., τ2q+22q±1). Should there be a compatible adder for the conjugate modulo 2q+2q-1+1, the new moduli set τ+2=22q,22q±1,2q+2q-1±1 can provide more than (8q-3) bits of dynamic range, with equally fast adders as those of τ2, whose dynamic range is only 6q bits. Therefore, we were motivated to propose the new modulo 2q+2q-1+1 and its equally fast parallel prefix adder with compatible cost as of its conjugate modulo. This is actually achieved via diminished-1 representation of residues, where those in 1,2q+2q-1 are encoded as 0,2q+2q-1-1 and a zero indicator bit is set for 0-valued residues. The VLSI simulation and synthesis results, especially for the common cases of q=2p, confirm the analytical evaluations regarding the balanced performance of the conjugate moduli 2q+2q-1±1.
Read full abstract