Abstract

The use of parallel computing can speed up the training of deep learning models. The traditional neural network model ResNet is chosen for testing in this paper on the effectiveness of data parallelism in image classification, and test data is provided in a 6-GPU environment. In this paper, it is suggested that various factors should be considered when building affordable hardware configurations to expedite model training in practical application scenarios. The communication costs are not insignificant because today's large computing clusters are primarily offered via cloud computing. Another crucial point to remember is that the number of CUDA cores is the primary hardware foundation for GPU acceleration technology. Therefore, acceleration may not be affected by more incredible video memory or fewer CUDA cores for some particular graphics cards. In addition, the issue of beyond-model performance in parallel computing is another issue that must be disregarded. Due to the parallel strategy's limitations, it is essential to tweak the super parameters while speeding up model training. The model's performance is more likely to be ensured by the lower learning rate and Batch size. This paper's experimental conclusion can support building an appropriate hardware configuration scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call