Abstract

This paper introduces a fundamentally new computer architecture for supercomputers. The core module is application compatible with an existing superscalar microprocessor, with minimized energy use, and is optimized for local sparse matrix operations. Optimized sparse matrix manip- ulation is discussed by analyzing the High Performance Conjugate Gradient (HPCG) benchmark speci...cation. This analysis shows how the DRAM memory wall is removed for this benchmark, and for sparse matrix models of partial di¤erential equations (PDEs) for a wide cross section of applications. By giving the programmer improved control over the con...guration of the super- computer, the potential for communication problems is minimized. Application compatibility is achieved while removing the superscalar instruction interpreter and multi-thread controller from the existing microprocessor’s hardware. These are transformed into compile-time utilities. The instruction cache is removed through an innovation in VLIW instruction processing. The data caches are unnecessary and are turned o¤ in order to optimally implement sparse matrix models.

Highlights

  • Today’s high performance, superscalar microprocessor includes a superscalar instruction interpreter [12], instuction and data caches, as well as a multi-thread controller [19]

  • This paper introduces a core module known as the Simultaneous Multi-Processor (SiMulPro) core module, which removes all of these problems

  • Removing the hardware overhead of the superscalar instruction interpreter, multithread controller and the instruction cache, requires that the SiMulPro core module be semantically compatible with the existing microprocessor of Figure 1

Read more

Summary

Introduction

1. The C1-adders of pipe 3 match the index list of a received package to determine which global object is being referenced, whether the package should be stored in this core module, and whether the global object uses finite difference, or finite element, indexing. 3. Pattern Recognizers (PatRecs) use either the index list (for finite element indexing), or the derived geometric addresses, to determine which vector store holds the package numeric data. The vectors are initialized, with the core modules storing the relevant components for their model locally. At this point in each core module, the Pat Man core’s determine which row, or rows, can be processed to alter the vector components. The received vector updates are used by the Pat Man cores to update the readiness of the local sparse rows for processing. When a row is ready for processing, the PatMan sends a message to the FP and Int cores, which is queued until the cores are ready to begin those operations

Software Development Today And Tomorrow
Threads
Local Implementation Of Sparse Matrix Solvers For A Geometric Neighborhood
Basic Systems Analysis Of Sparse Matrix Performance
Findings
Summary
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.