Abstract Simultaneous Localization and Mapping (SLAM), as one of the key elements of robot vision, has become an emerging topic in the past 3 decades. The focus of SLAM is to reconstruct the map surrounding the robot from sensors like camera or LiDAR and meanwhile, find the location of the robot itself inside that map. With the contribution of researchers, many different techniques and algorithms have been developed to improve the accuracy of SLAM. The main difference between those techniques is the choice of sensor to solve the SLAM problem. Some approaches are based on LiDAR sensors, which are LiDAR SLAM. Some of them are based on cameras, e.g.: Monocular, stereo, or RGB-D cameras, which are also known as visual SLAM (VSLAM). We will also review how deep learning methods like CNN and RNN together optimize VSLAM computation and remove some of the old modules from the traditional SLAM framework. By comparing the most recent techniques, we will start with some general differences between these techniques and mention some explicit differences in terms of applications. Finally, we will discuss the advantages and drawbacks of both techniques and propose some challenges and future direction towards both techniques.