Image-based mobile mapping systems enable an efficient acquisition of georeferenced image sequences, which can be used for geo-data capture in subsequent steps. In order to provide accurate measurements in a given reference frame while e.g. aiming at high fidelity 3D urban models, high quality georeferencing of the captured multi-view image sequences is required. Moreover, sub-pixel accurate orientations of these highly redundant image sequences are needed in order to optimally perform steps like dense multi-image matching as a prerequisite for 3D point cloud and mesh generation. While direct georeferencing of image-based mobile mapping data performs well in open areas, poor GNSS coverage in urban canyons aggravates fulfilling these high accuracy requirements, even with high-grade inertial navigation equipment. Hence, we conducted comprehensive investigations aiming at assessing the quality of directly georeferenced sensor orientations as well as the expected improvement by image-based georeferencing in a challenging urban environment. Our study repeatedly delivered mean trajectory deviations of up to 80 cm. By performing image-based georeferencing using bundle adjustment for a limited set of cameras and a limited number of ground control points, mean check point residuals could be lowered from approx. 40 cm to 4 cm. Furthermore, we showed that largely automated image-based georeferencing is capable of detecting and compensating discontinuities in directly georeferenced trajectories.