With the continuous development of autonomous driving technology, realizing high-precision road obstacle detection is crucial to ensure traffic safety and driving experience. However, traditional obstacle detection methods often perform poorly in complex driving scenarios, such as obstacle movement and occlusion. To cope with this problem, this study proposes a road obstacle detection method based on a two-stream convolutional neural network model, aiming to overcome the limitations of traditional methods in capturing spatiotemporal features and handling complex situations. Our research approach is based on the following innovations: First, we introduce a dual-stream convolutional neural network structure, where one stream is used to extract spatial features from the contour information of the obstacle frames, and the other stream extracts temporal features from the temporal stream information. This dual-stream structure can fully capture the appearance and dynamic information of the obstacles, thus improving detection accuracy. Second, we design a feature fusion module to fuse the two features to obtain richer obstacle features. In addition, we propose a new loss function, i.e. clustering loss, for better optimizing the model training process, reducing intra-class variation, and increasing inter-class differences, thus improving the generalization performance of the model. In the experimental section, we conducted extensive experimental analysis using Citycapes and BDD100K datasets. The experimental results show that our model achieves significant performance improvement compared to both the traditional convolutional neural network method and the YOLOv5 method in a variety of scenarios such as obstacle stationary, obstacle moving, and obstacle with occlusion. Specifically, our method improves the recognition rate up to 4.8% to 14.5% on the Citycapes dataset and 6.5% to 12.8% on the BDD100K dataset, respectively, under different scenarios. In addition, our model also exhibits more advantages on small datasets, showing higher generalization ability and robustness.
Read full abstract