Abstract

This brief presents a novel pipelined algorithm for transposing an ${N \times N}$ matrix, as well as a modular architecture for this algorithm. The architecture is optimal in both memory, using the minimum number of registers necessary to transpose an ${N \times N}$ matrix, and latency, achieving the theoretical minimum. The architecture is composed of a series of identical cascaded basic circuits and has a simple control strategy. Furthermore, the algorithm and architecture can be easily extended to ${p}$ –parallel where ${p}$ is any factor of ${N}$ .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call