Performance analysis of different distribution of Python and TensorFlow to efficiently utilize CPU on HPC Cluster

Krishan Gopal Gupta,Sanjay Wandhekar,Abhishek Das,Samrit Kumar Maity

doi:10.1109/icecet52533.2021.9698764

Abstract

In recent years, Artificial Intelligence (AI) and Deep Learning (DL) Research and Development (R&D) work has picked up amongst academia, research community and industry. Training Deep Neural Network model (DNN) requires huge amount of data and computing resources, specially Graphics Processing Unit (GPU) accelerator along with DL framework like TensorFlow and programing language like Python. Different distribution of framework and programing language is available and optimized for GPU and Central Processing Unit (CPU). Many literature, article and studies are published for DNN training on GPU. However, only few article and studied are available for DNN training on CPU especially in HPC Cluster. This paper presents performance analysis and comparison between different distribution of Python and TensorFlow to verify which combination run optimally on available CPU only node of a HPC cluster. We used ResNet50 [1], ResNet101 [1] and Inceptionv3 [2] neural network model of tf_cnn_benchmarks [3] for performance comparison. We further tune best identified software combination using distributed training technique, across single and multiple nodes. We did performance comparison based on different processor and architectures. We were able to show up to 7x performance improvement using Intel Distribution for TensorFlow on single node and up to 15.7X speedup on 16 nodes on different CPU architecture.

Full Text