Vector-SIMD architectures have gained increasing attention because of their high performance in signal-processing applications. However, the performance of existing vector-SIMD architectures remains limited because of their inefficiency in the coordinated exploitation of different hardware units. To solve this problem, this article proposes the FT-Matrix architecture, which improves the coordination of traditional vector-SIMD architectures from three aspects: the cooperation between the scalar and SIMD unit is refined with the dynamic coupling execution scheme, the communication among SIMD lanes is enhanced with the matrix-style communication, and data sharing among vector memory banks is accomplished by the unaligned vector memory accessing scheme. Evaluation results show an average performance gain of 58.5 percent against vector-SIMD architectures without the proposed improvements. A four-core chip with each core built on the FT-Matrix architecture is also under fabrication.
Read full abstract