The microarchitecture of general-purpose processors is continuously evolving to adapt to the new computation and memory demands of incoming workloads. In this regard, new circuitry is added to execute specific instructions like vector multiplication or string operations. These enhancements and the support of multiple threads per core make simultaneous multithreading (SMT) processors dominate the market for data center processors. Regarding emerging workloads, machine learning is taking an important role in many research domains like biomedicine, economics, and social sciences. This paper analyzes the efficiency of machine learning workloads running in SMT mode (two threads per core) versus running them in ST mode (single-threaded) with twice the number of cores. Experimental results in an Intel Xeon Skylake-X processor show an SMT efficiency falling between 80% and 100% across the studied workloads. These results prove two main findings: i) last-generation SMT processors are excellent candidates to execute ML workloads as they achieve a high SMT efficiency, and ii) if the performance of two major resources (i.e., FP double operator and core’s caches) was boosted, all the workloads would achieve an almost perfect SMT efficiency. Moreover, results show that there is still room to support more threads without adding extra hardware. The discussed findings are aimed at providing insights to design future processors for ML workloads.
Read full abstract