Virtual ground truth, and pre-selection of 3D interest points for improved repeatability evaluation of 2D detectors

Simon R Lang,David M Powers,Martin H Luerssen,W Anggono

doi:10.1051/matecconf/201927702032

Abstract

In Computer Vision, finding simple features is performed using classifiers called interest point (IP) detectors, which are often utilised to track features as the scene changes. For 2D based classifiers it has been intuitive to measure repeated point reliability using 2D metrics given the difficulty to establish ground truth beyond 2D. The aim is to bridge the gap between 2D classifiers and 3D environments, and improve performance analysis of 2D IP classification on 3D objects. This paper builds on existing work with 3D scanned and artificial models to test conventional 2D feature detectors with the assistance of virtualised 3D scenes. Virtual space depth is leveraged in tests to perform pre-selection of closest repeatable points in both 2D and 3D contexts before repeatability is measured. This more reliable ground truth is used to analyse testing configurations with a singular and 12 model dataset across affine transforms in x, y and z rotation, as well as x, y scaling with 9 well known IP detectors. The virtual scene's ground truth demonstrates that 3D preselection eliminates a large portion of false positives that are normally considered repeated in 2D configurations. The results indicate that 3D virtual environments can provide assistance in comparing the performance of conventional detectors when extending their applications to 3D environments, and can result in better classification of features when testing prospective classifiers' performance. A ROC based informedness measure also highlights tradeoffs in 2D/3D performance compared to conventional repeatability measures.

Highlights

In Computer Vision (CV), the establishment of ground truth so that new feature classification algorithms can be properly measured is an ongoing topic of research
It would be expected that interest points that are able to utilise the depth of the scene would result in more reliable and boosted repeatability rates, given that false positives can be avoided, and better candidates chosen
This paper explores the topic of interest point (IP) detectors and their repeatability across multiple scene transformations in virtualised 3D spaces with the assistance of 2D and 3D preselection

Summary

Introduction

In Computer Vision (CV), the establishment of ground truth so that new feature classification algorithms can be properly measured is an ongoing topic of research. Schmid’s metric for evaluation of a set of detectors K, classifies points between two pixel arrays ~xi , as either repeated, or not and uses a ratio of true positives and true negatives to measure performance. A threshold based on a radial distance ε around each point in the reference scene ~x1 determines classification. A homography H1i of ~xi enables threshold distances to be measured with ~x1 , and repeated points to be determined. The default threshold, ε=1.5, represents an error rate of 1 pixel distant, known as the Moore neighborhood, and is considered by Schmid, and researchers in general that apply this metric, to be the optmial tradeoff. Points that don’t share the same view area are removed from the validation process as they share no valid repeatable point candidates

Objectives

Methods

Results

Conclusion