A mapping system by vision-aided inertial navigation was developed for areas where GNSS signals are unreachable. In this framework, a methodology on the integration of vision and inertial sensors is presented, analysed and tested. The system employs the method of “SLAM: Simultaneous Localisation And Mapping” where the only external input available to the system at the beginning of the mapping mission is a number of features with known coordinates. SLAM is a term used in the robotics community to describe the problem of mapping the environment and at the same time using this map to determine the location of the mapping device. Differing from the robotics approach, the presented development stems from the frameworks of photogrammetry and kinematic geodesy that are merged in two filters that run in parallel: the Least-Squares Adjustment (LSA) for features coordinates determination and the Kalman filter (KF) for navigation correction. To test this approach, a mapping system-prototype comprising two CCD cameras and one Inertial Measurement Unit (IMU) is introduced. Conceptually, the outputs of the LSA photogrammetric resection are used as the external measurements for the KF that corrects the inertial navigation. The filtered position and orientation are subsequently employed in the photogrammetric intersection to map the surrounding features that are used as control points for the resection in the next epoch. We confirm empirically the dependency of navigation performance on the quality of the images and the number of tracked features, as well as on the geometry of the stereo-pair. Due to its autonomous nature, the SLAM's performance is further affected by the quality of IMU initialisation and the a-priory assumptions on error distribution. Using the example of the presented system we show that centimetre accuracy can be achieved in both navigation and mapping when the image geometry is optimal.