Respiratory-induced tumor motion presents a critical challenge in lung cancer radiotherapy, potentially impacting treatment precision and efficacy. This study introduces an innovative, deep learning-based approach for real-time, markerless lung tumor tracking utilizing orthogonal X-ray projection images. It incorporates three key components: (1) a sophisticated data augmentation technique combining a hybrid deformable model with 3D thin-plate spline transformation, (2) a state-of-the-art Transformer-based segmentation network for precise tumor boundary delineation, and (3) a CNN regression network for accurate 3D tumor position estimation. We rigorously evaluated this approach using both patient data from The Cancer Imaging Archive and dynamic thorax phantom data, assessing performance across various noise levels and comparing it with current leading algorithms. For TCIA patient data, the average DSC and HD95 values were 0.9789 and 1.8423 mm, respectively, with an average centroid localization deviation of 0.5441 mm. On CIRS phantoms, DSCs were 0.9671 (large tumor) and 0.9438 (small tumor) with corresponding HD95 values of 1.8178 mm and 1.9679 mm. The 3D centroid localization accuracy was consistently below 0.33 mm. The processing time averaged 90 ms/frame. Even under high noise conditions (S2 = 25), errors for all data remained within 1 mm with tracking success rates mostly at 100%. In conclusion, the proposed markerless tracking method demonstrates superior accuracy, noise robustness, and real-time performance for lung tumor localization during radiotherapy. Its potential to enhance treatment precision, especially for small tumors, represents a significant step toward improving radiotherapy efficacy and personalizing cancer treatment.