Abstract

The operation of trains is substantially jeopardized by the presence of railway obstacles. Recent research on vision-based railway obstacle detectors has emphasized using sizeable neural network models to pursue a high accuracy while neglecting power consumption and training costs. To accelerate the model updating, training, and deployment cycle applied in the railway safety guarantee system, this article proposes a lightweight three-stage detection framework to identify obstacles in the single railway image. It is mainly comprised of the coarse region proposal (CRP) module, the lightweight railway obstacle detection network (RODNet), and postprocessing stage. In the CRP module, we set the binarized normed gradient (BING) detector to work on two different scale images for a more effective generation of proposal regions. Then, the intersection-over-minimum (IoM) metric is presented to filter and merge generated subregions. In the RODNet, a more lightweight and effective detection model is designed based on substantial improvements of YOLOv4-tiny, to simultaneously predict bounding boxes on the whole image and in each subregion. The RODNet’s backbone is rebuilt with one-shot aggregation and attention mechanism to reduce parameters and to maintain feature extraction abilities. Moreover, some residual blocks are also integrated into RODNet’s neck and head to improve the detection performance. Finally, a series of operations is devised to fuse, remove, and suppress predicted boxes on the boundary of each candidate subregion to reduce false-positive results at the postprocessing stage. Experimental results in the railway obstacle dataset show that RODNet reduces 48% parameters and 29% floating-point operations per second (FLOPs) with at least a 68.2% improvement of inference speed in the CPU compared with YOLOv4-tiny. Meanwhile, the proposed framework provides 80.3 mean average precision (mAP) at a speed of 65.9 ms/pic in the low-power graphics processing unit (GPU) and achieves a competitive or even better performance compared with the state-of-the-art large detection models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call