Abstract

For the 2020 Student Cluster Competition, we reproduced results from “MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization” (Hidayetoğlu <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> ). Reproducibility is of critical importance to the scientific community, not just to verify correctness of results but also to see how easily others can understand and work with the given methods. MemXCT is an approach for image reconstruction in X-ray ptychography, which has a broad range of applications in materials science. MemXCT is not the only X-ray tomography algorithm, though; as opposed to compute-centric algorithms, it is designed to scale better by optimizing for memory bandwidth and memory latency. MemXCT also applies several key optimizations in order to ease memory pressure. In this article, we test the performance and strong scaling of MemXCT on 1 to 256 AMD CPU cores (1-4 nodes) and 1-16 Nvidia V100 GPUs (1-4 nodes). We confirm the impact of MemXCT’s optimizations. Still, we find that the performance of some important loops in the MemXCT kernel is much lower on the AMD processors (with AVX2) of our CPU nodes compared to the Intel CPUs (with AVX-512) used in the original article. We also confirm MemXCT performance on Tesla V100 GPUs, as reported in the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call