The ability of a robot to build a persistent, accurate, and actionable model of its surroundings through sensor data in a timely manner is crucial for autonomous operation. While representing the world as a point cloud might be sufficient for localization, denser scene representations are required for obstacle avoidance. On the other hand, higher-level semantic information is often crucial for breaking down the necessary steps to autonomously complete a complex task, such as cooking. So the looming question is, What is a suitable scene representation for the robotic task at hand? This survey provides a comprehensive review of key approaches and frameworks driving progress in the field of robotic spatial perception, with a particular focus on the historical evolution and current trends in representation. By categorizing scene modeling techniques into three main types—metric, metric–semantic, and metric–semantic–topological—we discuss how spatial perception frameworks are transitioning from building purely geometric models of the world to more advanced data structures incorporating higher-level concepts, such as the notion of object instances and places. Special emphasis is placed on approaches for real-time simultaneous localization and mapping, their integration with deep learning for enhanced robustness and scene understanding, and their ability to handle scene dynamicity as some of the hottest topics of interest driving robotics research today. We conclude with a discussion of ongoing challenges and future research directions in the quest to develop robust and scalable spatial perception systems suitable for long-term autonomy.
Read full abstract