Abstract
Stereo vision is used in many application areas, such as robot-assisted manufacturing processes. Recently, many different efficient stereo matching algorithms based on deep learning have been developed to solve the limitations of traditional correspondence point analysis, among others. The challenges include texture-poor objects or non-cooperative objects. One of these end-to-end learning algorithms is the Adaptive Aggregation Network (AANet/AANet+), which is divided into five steps: feature extraction, cost volume construction, cost aggregation, disparity computation and disparity refinement. By combining different components, it is easy to create an individual stereo matching model. Our goal is to develop efficient learning methods for robot-assisted manufacturing processes for cross-domain data streams. The aim is to improve recognition tasks and process optimisation. To achieve this, we have investigated the AANet+ in terms of usability and efficiency on our own test-dataset with different measurement setups (passive stereo system). Input of the AANet+ are rectified stereo pairs of the test-dataset and a pre-trained model. Instead of generating our own training dataset, we used two pre-trained models based on the KITTI-2015 and SceneFlow datasets. Our research has shown that the pretrained model based on the Scene Flow dataset predicts disparities with better object delimination. Due to the Out-of-Distribution inputs, only reliable disparity predictions of the AANet are possible for test data sets with parallel measurement setup. We compared the results with two traditional stereo matching algorithms (SemiGlobal block matching and DAISY). Compared to the traditionally computed disparity maps, the AANet+ is able to robustly detect texture-poor objects and optically non-cooperative objects.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have