Abstract

We report performance measurements made on the 2-CPU CRAY X-MP at ECMWF, Reading. Vector (SIMD) performance on one CPU is interpreted by the two parameters ( r ∞, n 1 2 ), and we find for dyadic operations using FORTRAN r ∞ = 70 Mflop/s, n 1 2 = 53 flop . All vector triadic operations produce r ∞ = 107 Mflop/s, n 1 2 = 45 flop ; and a triadic operation with two vectors and one scalar gives r ∞ = 148 Mflop/s and n 1 2 = 60 flop . MIMD performance using both CPUs on one job is interpreted with the two parameters ( r ∞, s 1 2 ), where s 1 2 is the amount of arithmetic that could have been done during the time taken to synchronize the two CPUs. We find, for dyadic operations using the TSKSTART and TSKWAIT synchronization primitives, that r ∞ = 130 Mflop/s and s 1 2 = 5700 flop . This means that a job must contain more than ∼ 6000 floating-point operations if it is to run at more than 50% of the maximum performance when split between both CPUs by this method. Less expensive synchronization methods using LOCKS and EVENTS reduces s 1 2 to 4000 flop and 2000 flop respectively. A simplified form of LOCK synchronization written in CAL code further reduces s 1 2 to 220 flop. This is probably the minimum possible value for synchronization overhead on the CRAY X-MP.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.