Abstract

Detecting and classifying real-life small traffic signs from large input images is difficult due to their occupying fewer pixels relative to larger targets. To address this challenge, we proposed a deep-learning-based model (Dense-RefineDet) that applies a single-shot, object-detection framework (RefineDet) to maintain a suitable accuracy–speed trade-off. We constructed a dense connection-related transfer-connection block to combine high-level feature layers with low-level feature layers to optimize the use of the higher layers to obtain additional contextual information. Additionally, we presented an anchor-design method to provide suitable anchors for detecting small traffic signs. Experiments using the Tsinghua-Tencent 100K dataset demonstrated that Dense-RefineDet achieved competitive accuracy at high-speed detection (0.13 s/frame) of small-, medium-, and large-scale traffic signs (recall: 84.3%, 95.2%, and 92.6%; precision: 83.9%, 95.6%, and 94.0%). Moreover, experiments using the Caltech pedestrian dataset indicated that the miss rate of Dense-RefineDet was 54.03% (pedestrian height > 20 pixels), which outperformed other state-of-the-art methods.

Highlights

  • Traffic sign recognition plays a key role in advanced driver-assistance systems and automatic driving and is a hot topic in computer vision research and applications

  • We found that establishing the centers of the anchors of each feature map cell as the center of the cell was not optimal for detecting small traffic signs, which motivated our use of a new anchor-design method

  • To detect small traffic signs, we found that the anchor-design method in the previous study [21] was not optimal, because the anchor shapes were suitable for objects with ground-truth bounding box (GTB) of different aspect ratios, despite the fact that real-world traffic signs usually share similar aspect ratios

Read more

Summary

Introduction

Traffic sign recognition plays a key role in advanced driver-assistance systems and automatic driving and is a hot topic in computer vision research and applications. Previous studies applied the convolutional neural network (CNN) either for detection or classification processes [2,9,10,11], whereas others regarded traffic sign recognition as a common object-detection task [5,6,7,12] These methods used one CNN structure to effectively locate and classify traffic signs simultaneously; the challenge lies in accurately locating and classifying small traffic signs from large input images. Compared with enlarging small regions, which usually decreases speed, exploiting contextual information is preferred due to its ability to provide additional information for related target objects [14,15,16] This method has been widely used in CNN-based small-object detection methods, such as using deconvolution or atrous convolution [17]. Experiments using the Tsinghua-Tencent 100K and Caltech pedestrian datasets demonstrated that the Dense-Refinedet model enhanced the detection accuracy of the original RefineDet and achieved competitive performance with other state-of-the-art methods used for detecting real-world traffic signs and pedestrians

Context-Related CNN-Based Object-Detection Methods
CNN-Based Traffic Sign-Detection and -Classification Methods
RefineDet Rrevisited
Framework Overview
Anchor Design
Building the Dense-TCB
Datasets and Experimental Setup
Performance on the Tsinghua-Tencent 100K Dataset
Methods
Performance on the Caltech Pedestrian Dataset
Ablation Study
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.