<h3>Purpose/Objective(s)</h3> Dual-energy (DE) fluoroscopy is being considered for markerless tumor tracking (MTT) of lung tumors. The principal advantage of DE imaging is that it removes overlying bony anatomy and improves tumor tracking accuracy, compared with single energy (SE) fluoroscopy. However, to be used clinically, DE imaging requires specific hardware/software which is not widely available. To address this limitation, we developed and implemented a convolutional neural network (CNN) to generate synthetic DE (sDE) images from SE images. The goal of this study is to evaluate the accuracy of MTT using sDE as compared to SE and real DE (rDE) images. <h3>Materials/Methods</h3> To evaluate MTT with DE fluoroscopy, a thoracic motion phantom, consisting of a torso with embedded ribs and spine along with a lung-type cavity, was used. A simulated tumor (10 mm diameter) was placed inside the lung equivalent compartment of the phantom and then programmed to simulate breathing (peak-to-peak amplitude=15 mm, period = 5 s) using a programmed cos<sup>4</sup> function. While the target was moving, fast kV switching images (alternating 60 and 120 kVp) were obtained using the on-board imager (OBI) of a commercial linear accelerator. Images were acquired over 360 degrees of rotation, allowing the target projection to overlap with varying amounts of bone. Weighted logarithmic subtraction was performed offline on consecutive 60 and 120 kVp projections to produce DE images. Separately, a CNN (U-net) was trained to produce sDE directly from the 120 kVp images (which are used clinically for SE fluoroscopy). The CNN uses a "U-Net" type architecture, with skip connections used to pass low-level information across the network for use in image translation. A template-based matching algorithm was then used to track target motion on SE (120 kVp), rDE, and sDE images. A quantitative analysis was performed based on tracking success rate (TSR) which was evaluated against ground truth (GT) using the programmed cos<sup>4</sup> waveform. <h3>Results</h3> A total of 449 SE, rDE and sDE image frames were evaluated for the thoracic phantom with a 10 mm moving target. TSR values based on < 1 mm agreement with GT on SE, rDE, and sDE images were 53.5%, 89.7%, and 74.2%, respectively. The percentage of images with a TSR within < 2 mm agreement with GT was 71.9% for SE, 93.5% for rDE and 89.5% for sDE. Lower TSR percentages were observed on sDE and SE images where the simulated tumor overlapped the spine. <h3>Conclusion</h3> sDE images generated using a CNN have a TSR that is higher than SE images, and have TSR values that are close to rDE images. Further development in the CNN is required to improve the tracking accuracy of sDE images in the spine region. The preliminary findings of this study suggest that the use of sDE images generated from SE images potentially allow for tracking with DE-like images on any OBI, without the need for additional hardware/software.