Box girders serve as crucial upper-level load-bearing components in high-speed railway simply-supported bridges, requiring sufficient structural rigidity during operation. The occurrence of cracks compromises the overall stiffness of the structure, posing significant safety risks and potentially leading to substantial loss of life and property. Therefore, it is essential to rapidly and accurately detect cracks within the girder structure, particularly in the interior of box girders where access for maintenance by personnel is inconvenient. To address this issue, this paper proposes a robot-based framework for crack detection in high-speed railway box girder, and accurately evaluate the damage status of structures. This comprehensive framework includes an image generation network for generating high-quality crack images, a lightweight object detection algorithm for rapidly identifying crack targets, and a high-precision semantic segmentation algorithm for accurately extracting crack pixels. Comparative analysis with mainstream algorithms validates the superiority of the proposed methods. Moreover, preliminary validation through simulated tests highlights the feasibility of the proposed framework, offering novel method and theoretical support for the intelligent operation and maintenance of high-speed railway bridge structures.