Time-of-flight diffraction (ToFD) is a widely used ultrasonic nondestructive evaluation (NDE) method for locating and characterizing rough defects, with high accuracy in sizing smooth cracks. However, naturally grown defects often have irregular surfaces, complicating the received tip diffraction waves and affecting the accuracy of defect characterization. This article proposes a self-attention (SA) deep learning method to interpret the ToFD A-scan signals for sizing rough defects. A high-fidelity finite-element (FE) simulation software Pogo is used to generate the synthetic datasets for training and testing the deep learning model. Besides, the transfer learning (TL) method is used to fine-tune the deep learning model trained by the Gaussian rough defects to boost the performance of characterizing realistic thermal fatigue rough defects. An ultrasonic experiment using 2-D rough crack samples made by additive manufacturing is conducted to validate the performance of the developed deep learning model. To demonstrate the accuracy of the proposed method, the crack characterization results are compared with those obtained using the conventional Hilbert peak-to-peak sizing method. The results indicate that the deep learning method achieves significantly reduced uncertainty and error in rough defect characterization, in comparison with traditional sizing approaches used in ToFD measurements.