Design of Deep Learning VLIW Processor for Image Recognition

Lin Li,Juan Wu,Shengbing Zhang

doi:10.1051/jnwpu/20203810216

Abstract

In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing researches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model. Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word (VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design. The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications. The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz. For selecting benchmarks, the processor speed is about 12X faster than CPU and 7X faster than GPU at least. Comparing with the results of the software framework, the average error of the test accuracy of the processor is less than 1%.

Highlights

In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing re⁃ searches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model
Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word ( VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design
The test results based on FPGA pro⁃ totype system show that the processor can effectively complete the image classification and object detection applica⁃ tions

Summary

Introduction

用线下训练方式获取参数。训练过程中采用当前业界流行的深度学习框架 Caffe[12] ,硬件环境包括了 CPU( Core i7,6700HQ) 和 GPU ( GTX960M) 。测试基准采用了网络结构修改过的 LeNet⁃5[1⁃2] 和 Alex⁃ Net[1,3] , MobileNet[5] 和 SSD300 + MobileNet[4⁃5] 等深度卷积神经网络模型,训练及测试数据集分别采用了 MNIST[2] , CIFAR⁃10[13] , Stanford Dogs[14] , PASCAL VOC2007[15] 和 VOC2012[15] 等。训练完成后从得到的 Caffemodel 模型中提取神经网络的参数,经过预处理后用于处理器的计算。将要部署的深度神经网络模型 Prototxt 文件通过软件编译器映射到处理器,产生处理器运行的 VLIW 指令序列。在图像分类测试中, 采用了 LeNet⁃5[1⁃2] 、 Alex⁃ Net[1,3] 和 MobileNet[5] 作为测试基准, 分别在 MNIST[2] 、CIFAR⁃10[13] 和 Stanford Dogs[14] 数据集上进行测试,取得的测试精度与软件框架 Caffe[12] 测试的精度对比如表 2 所示。在图像分类试验过程中,通过 Caffe[12] 的计时功能测得了测试基准在相应数据集上处理一副图像分别采用硬件环境中 CPU 和 GPU 所占用的时间, 并通过仿真获得了深度学习 VLIW 处理器的运行时间,其对比如表 3 所示。

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University	Publication Date: Feb 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Design of Deep Learning VLIW Processor for Image Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

Lead the way for us

Similar Papers

A comparative evaluation of deep convolutional neural network and deep neural network-based land use/land cover classifications of mining regions using fused multi-sensor satellite data
Ajay Kumar ... Amit Kumar Gorai
Advances in Space Research | VOL. 72
Ajay Kumar, et. al.Ajay Kumar ... Amit Kumar Gorai
04 Sep 2023
Advances in Space Research | VOL. 72

Deep learning-based computed tomography applied to the diagnosis of rib fractures
Zhen-Wei Lin ... Hong Wu
Journal of Radiation Research and Applied Sciences | VOL. 16
Zhen-Wei Lin, et. al.Zhen-Wei Lin ... Hong Wu
14 Mar 2023
Journal of Radiation Research and Applied Sciences | VOL. 16

Artificial intelligence software available for medical devices: surgical phase recognition in laparoscopic cholecystectomy
Ken’Ichi Shinozuka ... Atsuro Fujinaga
Surgical Endoscopy | VOL. 36
Ken’Ichi Shinozuka, et. al.Ken’Ichi Shinozuka ... Atsuro Fujinaga
09 Mar 2022
Surgical Endoscopy | VOL. 36

Nop compression scheme for high speed DSPs based on VLIW architecture
Taisong Jin ... Minwook Ahn
-
Taisong Jin, et. al.Taisong Jin ... Minwook Ahn
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design of Deep Learning VLIW Processor for Image Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University