Timing mismatch between different stages of physical design poses great challenges for circuit optimization to achieve the desired performance, power and area (PPA) tradeoff. The inaccurate timing estimation prior to routing may lead to over-design with unwanted power and area consumption or iterating back to cell placement at the cost of design turn-around time. Existing learning models could not predict post-routing circuit timing with satisfying accuracy and efficiency due to the limitations of the ignorance of delay correlation along the timing path and the empirical feature selection solutions. In this work, an accurate and efficient pre-routing path delay prediction framework is proposed by utilizing transformer network and residual model with an ensemble feature selection mechanism. Owing to the combined filter and wrapper methods, an ensemble feature selection mechanism is implemented to determine the optimal feature subset based on the timing and physical information at the placement stage for path delay prediction, which is extracted as feature sequences for each cell along the timing path to be trained by transformer network. With the residual model, the predicted timing mismatch between the placement and routing stages by transformer network is further calibrated to estimate the post-routing path delay. The proposed framework has been validated with ISCAS’85 and OpenCores benchmark circuits for the prediction of post-routing path delay, where the perdition error in terms of rRMSE is limited within 1.3% and 3.0% and the correlation coefficient R is higher than 0.999 and 0.995 for seen and unseen circuits respectively, indicating an error reduction by 2.3 10.6 times compared by prior learning-based models. In addition, the framework achieves average three orders of magnitude speedup compared with the commercial tools and is accelerated by a factor of 14 128 as against the competitive learning models, which is promising to be applied to guide design optimization prior to time-consuming routing stage.