To this date the safety assessment of materials, used for example in the nuclear power sector, commonly relies on a fracture mechanical analysis utilizing macroscopic concepts, where a global load quantity K or J is compared to the materials fracture toughness curve. Part of the experimental effort involved in these concepts is dedicated to the quantitative analysis of fracture surfaces, which is a cumbersome and time-consuming process. In addition, however, it is affected by subjective bias. Within the scope of our study, we have introduced a methodology that employs deep learning models for macroscopic fracture surface segmentation. To fight subjective bias and enable improved reproducibility, we set up three distinct and unique datasets labeled by different experts that correspond to typical isolated laboratory conditions and complex real-world (multi-laboratory) scenarios. To demonstrate the benefit of our approach for the fracture mechanics assessment, we employed the models for initial crack size measurements utilizing the area average method. Furthermore, the influence of structural similarity on the segmentation capability, differing due to the miscellaneous materials, specimen types, as well as imaging-induced variance has been analyzed. For semi-supervised learning a weak-to-strong consistency regularization was implemented. We were able to train robust and well-generalizing models that learned feature representations from images across different domains without observing a significant drop in prediction quality. Our approach reduced the number of labeled images required for training by a factor of 6. The deep learning assisted measurements proved to be as precise as manual measurements. For the multi-laboratory data, mean measurement deviations smaller 1 % could be achieved, showcasing the enormous potential of our approach.