In urban environments, inspection robots face complex terrain and variable motion states, posing high demands on their positioning systems. Although the integration of Micro-Electro-Mechanical Systems Inertial Navigation Systems (MEMS-INS) with the Global Positioning System (GPS) provides continuous positioning information, high buildings and tunnels in cities can block GPS signals, leading to signal interruptions and increased positioning errors. During GPS outages, MEMS-INS gradually accumulates errors, severely affecting positioning accuracy. To address this issue, this paper proposes an adaptive error state Kalman Filter (AESKF), which employs an adaptive mechanism to eliminate the noise impact of MEMS-INS and reduce reliance on the process model. Additionally, a deep learning framework based on the Self-Attention mechanism of the Transformer and a custom loss function Long Short-Term Memory (LSTM) module is proposed to predict position increments of the inspection robot. Combining AESKF with Transformer-LSTM achieves optimized positioning accuracy of the inspection robot during GPS outages in dynamic urban environments. Simulation and practical experimental results demonstrate that the combination of AESKF and Transformer-LSTM significantly improves positioning accuracy. Compared to other mature methods, the Root Mean Square Error (RMSE) of positioning is reduced by up to 83.64 % in the north direction and 89.56 % in the east direction. When the GPS signal interruption lasts for 10 s and 60 s, the maximum position error standard deviation (STD) is 0.1186 m and 1.0417 m, respectively.