Purpose: We present a markerless vision-based method for on-the-fly three-dimensional (3D) pose estimation of a fiberscope instrument to target pathologic areas in the endoscopic view during exploration. Approach: A 2.5-mm-diameter fiberscope is inserted through the endoscope's operating channel and connected to an additional camera to perform complementary observation of a targeted area such as a multimodal magnifier. The 3D pose of the fiberscope is estimated frame-by-frame by maximizing the similarity between its silhouette (automatically detected in the endoscopic view using a deep learning neural network) and a cylindrical shape bound to a kinematic model reduced to three degrees-of-freedom. An alignment of the cylinder axis, based on Plücker coordinates from the straight edges detected in the image, makes convergence faster and more reliable. Results: The performance on simulations has been validated with a virtual trajectory mimicking endoscopic exploration and on real images of a chessboard pattern acquired with different endoscopic configurations. The experiments demonstrated a good accuracy and robustness of the proposed algorithm with errors of in distance position and in axis orientation for the 3D pose estimation, which reveals its superiority over previous approaches. This allows multimodal image registration with sufficient accuracy of . Conclusion: Our pose estimation pipeline was executed on simulations and patterns; the results demonstrate the robustness of our method and the potential of fiber-optical instrument image-based tracking for pose estimation and multimodal registration. It can be fully implemented in software and therefore easily integrated into a routine clinical environment.