Abstract

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of applications are scalar, vector, and matrix, our proposed Trident processor extends the classical vector ISA with matrix operations. The Trident processor consists of a set of parallel vector pipelines (PVPs) combined with a fast in order scalar core. The PVPs can access both vector and matrix register files to perform vector, matrix, and matrix-vector operations. One key point of our design is the exploitation of up to three levels of data parallelism. Another key point is the ring register files for storing vector and matrix data. The ring structure of the register files reduces the number and size of the address decoders, the number of ports, the area overhead caused by the address bus, and the number of registers attached to bit lines, as well as providing local communication between PVPs. The scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires replication of PVPs and increasing the register file size. Scientific, engineering, multimedia, and many other applications, which are based on a mixture of scalar, vector, and matrix operations, can be speeded up on the Trident processor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.