SummaryAt present, the visual Simultaneous localization and mapping method has a wide range of applications in mobile robots and unmanned driving. However, when the environment changes, the accuracy of the system drops sharply. Aiming at the low matching accuracy of local feature matching and tracking optimization in current visual simultaneous localization and mapping methods, the attention map neural network and Sinkhorn algorithm are used to extract and match the image features. Then, it improves the real‐time performance of the visual simultaneous localization and mapping system by local optimization of feature point matching, and proposes an optimization method for visual simultaneous localization and mapping in view of an end‐to‐end algorithm. The experiment illustrates that in the experimental comparison of HPatches View dataset, the Homography estimation was improved by 0.092, the matching accuracy was improved by 0.087, and the matching recall was improved by 0.038. In the experimental comparison of HPatches Illum dataset, the Homography estimation was improved by 0.015, the matching accuracy is improved by 0.076, and the matching recall was improved by 0.036. The local feature matching and tracking optimization method of the visual Simultaneous localization and mapping system through the attention map neural network and Sinkhorn algorithm can enhance the matching accuracy of local feature matching and tracking optimization in dynamic environment and under changing lighting conditions.