Phase unwrapping is the process of retrieving the true phase values from observed wrapped phases by adding the correct multiples of 2π. This process is crucial in synthetic aperture radar (SAR) interferometry, and numerous studies have aimed to enhance its performance. This study explored phase unwrapping using a modified U-Net regression model by optimizing both the network structure and training data. For network structure optimization, the study first compared model performance based on the size ratio of the lowest feature maps, determined by the number of pooling layers, to the size of convolutional kernels. A multi-kernel U-Net structure was developed to ensure robustness against variations in phase noise and gradient, compared to a standard single-kernel U-Net. Regarding the training data, data augmentation was implemented to address imbalances and better represent the local noise characteristics found in actual SAR interferograms. The training data was simulated to include local noise effects based on coherence measurements from real SAR data, as well as simple noise used for benchmarking the unwrapping performance with different training datasets. The results indicated that when the convolutional kernel size is smaller than the feature map size at the lowest layer, increasing the number of pooling layers leads to improvements in unwrapping performance. Conversely, performance decreased when the feature map size at the lowest layers was smaller than the convolutional kernel size. Specifically, the single-kernel U-Net with six pooling layers and the multi-kernel U-Net with five pooling layers exhibited the best unwrapping performance. Considering both simulated and real synthetic aperture radar interferogram data, the mean absolute errors for the single- and multi-kernel U-Net trained with simple noise were approximately 0.235 and 0.254, respectively. In contrast, for models trained with locally variable noise simulation data, the MAEs dropped to about 0.033 and 0.032, showing an improvement by approximately eightfold over models trained with simple noise. For real synthetic aperture radar interferograms, the mean absolute errors were 0.542 (single-kernel U-Net trained using simple noise), 0.592 (multi-kernel U-Net trained using simple noise), 0.542 (single-kernel U-Net trained using local noise), and 0.445 (proposed), respectively, underscoring the significant impact of training data on unwrapping performance. The study also evaluated the performance of the statistical-cost, network-flow algorithm for phase unwrapping (SNAPHU), obtaining mean absolute errors of about 0.043 for simulation data and 0.861 for real SAR data. Consequently, the multi-kernel model trained with locally different noise simulation data demonstrated roughly twice the performance compared to the traditional phase unwrapping method.