Abstract

The use of parallel computing can speed up the training of deep learning models. The traditional neural network model ResNet is chosen for testing in this paper on the effectiveness of data parallelism in image classification, and test data is provided in a 6-GPU environment. In this paper, it is suggested that various factors should be considered when building affordable hardware configurations to expedite model training in practical application scenarios. The communication costs are not insignificant because today's large computing clusters are primarily offered via cloud computing. Another crucial point to remember is that the number of CUDA cores is the primary hardware foundation for GPU acceleration technology. Therefore, acceleration may not be affected by more incredible video memory or fewer CUDA cores for some particular graphics cards. In addition, the issue of beyond-model performance in parallel computing is another issue that must be disregarded. Due to the parallel strategy's limitations, it is essential to tweak the super parameters while speeding up model training. The model's performance is more likely to be ensured by the lower learning rate and Batch size. This paper's experimental conclusion can support building an appropriate hardware configuration scheme.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.