Abstract

In recent years, Artificial Intelligence (AI) and Deep Learning (DL) Research and Development (R&D) work has picked up amongst academia, research community and industry. Training Deep Neural Network model (DNN) requires huge amount of data and computing resources, specially Graphics Processing Unit (GPU) accelerator along with DL framework like TensorFlow and programing language like Python. Different distribution of framework and programing language is available and optimized for GPU and Central Processing Unit (CPU). Many literature, article and studies are published for DNN training on GPU. However, only few article and studied are available for DNN training on CPU especially in HPC Cluster. This paper presents performance analysis and comparison between different distribution of Python and TensorFlow to verify which combination run optimally on available CPU only node of a HPC cluster. We used ResNet50 [1], ResNet101 [1] and Inceptionv3 [2] neural network model of tf_cnn_benchmarks [3] for performance comparison. We further tune best identified software combination using distributed training technique, across single and multiple nodes. We did performance comparison based on different processor and architectures. We were able to show up to 7x performance improvement using Intel Distribution for TensorFlow on single node and up to 15.7X speedup on 16 nodes on different CPU architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call