Abstract

Simultaneous Localisation And Mapping (SLAM) has long been recognised as a core problem to be solved within countless emerging mobile applications that require intelligent interaction or navigation in an environment. Classical solutions to the problem primarily aim at localisation and reconstruction of a geometric 3D model of the scene. More recently, the community increasingly investigates the development of Spatial Artificial Intelligence (Spatial AI), an evolutionary paradigm pursuing a simultaneous recovery of object-level composition and semantic annotations of the recovered 3D model. Several interesting approaches have already been presented, producing object-level maps with both geometric and semantic properties rather than just accurate and robust localisation performance. As such, they require much broader ground truth information for validation purposes. We discuss the structure of the representations and optimisation problems involved in Spatial AI, and propose new synthetic datasets that, for the first time, include accurate ground truth information about the scene composition as well as individual object shapes and poses. We furthermore propose evaluation metrics for all aspects of such joint geometric-semantic representations and apply them to a new semantic SLAM framework. It is our hope that the introduction of these datasets and proper evaluation metrics will be instrumental in the evaluation of current and future Spatial AI systems and as such contribute substantially to the overall research progress on this important topic.

Highlights

  • Research at the intersection of robotics and computer vision currently targets the development of a handful of important applications in which some form of artificial, intelligent device is employed to assist humans in the execution of everyday tasks

  • The proposed metrics for Spatial Artificial Intelligence (AI) systems evaluate all aspects of the Hybrid Graphical

  • In order to evaluate the overall capacity of a Spatial AI algorithm to parse a scene and gain a correct understanding of the composition and semantics of the individual objects, we propose to take the Intersection over Union (IoU) between the ground truth and the inferred label distributions

Read more

Summary

Introduction

Research at the intersection of robotics and computer vision currently targets the development of a handful of important applications in which some form of artificial, intelligent device is employed to assist humans in the execution of everyday tasks. Service robots able to execute complex tasks such as moving and manipulating arbitrary objects could provide an answer to this problem. Intelligence augmentation: eye-wear such as the Microsoft Hololens is already pointing at the future form of highly portable, smart devices capable of assisting humans during the execution of everyday tasks. We take each 3D point in the estimated point cloud and find its nearest neighbour in the ground truth background model. The quality of the background geometry estimation is expressed by taking the mean or median of all distances between nearest-neighbour pairs

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call