Abstract

AbstractSingle shot, semantic bounding box detectors, trained in a supervised manner are popular in computer vision‐aided visual inspections. These methods have several key limitations: (1) bounding boxes capture too much background, especially when images experience perspective transformation; (2) insufficient domain‐specific data and cost to label; and (3) redundant or incorrect detection results on videos or multi‐frame data; where it is a nontrivial task to select the best detection and check for outliers. Recent developments in commercial augmented reality and robotic hardware can be leveraged to support inspection tasks. A common capability of the previous is the ability to obtain image sequences and camera poses. In this work, the authors leverage pose information as “prior” to address the limitations of existing supervised learned, single‐shot, semantic detectors for the application of visual inspection. The authors propose an unsupervised semantic segmentation method (USP), based on unsupervised learning for image segmentation inspired by differentiable feature clustering coupled with a novel outlier rejection and stochastic consensus mechanism for mask refinement. USP was experimentally validated for a spalling quantification task using a mixed reality headset (Microsoft HoloLens 2). Also, a sensitivity study was conducted to evaluate the performance of USP under environmental or operational variations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call