Abstract

The rapid advancement in visual-inertial simultaneous localization and mapping (SLAM) has opened numerous applications in computer vision. However, the scarcity of high quality, publicly accessible datasets hampers the evaluation of SLAM performance in varied and tailored environments. In this study, I employed the AirSim simulator and the Unreal Engine 4 to generate a trajectory resembling that of the TUM VI Room 1 ground truth dataset within the ArchViz indoor environment representing a well-lit, furnished room. I further modified the environment and trajectory through various expansions, addition of features, and data smoothing to ensure a more stable sequence of input frames into the SLAM architecture. I then examined the efficiency of visual ORB-SLAM3 by inputting images of resolution 256×144 and 512×288 at 30 frames per second (FPS), while also adjusting the feature threshold - the maximum number of feature points that ORB-SLAM3 tracks per frame. This investigation of the camera parameters within AirSim and ORB-SLAM3 has led to the essential finding that the resolution of the input images must coincide with the dimensions of the film. The subsequent runs under these variables reveal that higher resolution images lead to considerably better tracking, with an optimal feature threshold ranging between 3000~12000 feature points per frame. Moreover, ORB- SLAM3 demonstrated significantly enhanced robustness within dynamic environments containing moving objects when using higher resolution inputs, with a decreased error of close to 0cm compared to 23.19cm for lower resolutions (averaged over three runs). Finally, I conducted qualitative testing using real-life indoor environments recorded with an iPhone Xr camera, which produces results that highlight the challenges faced by ORB-SLAM3 due to factors such as glare and motion blur.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call