Deep learning (DL) is becoming increasingly popular in numerous application fields within the current Fourth Industrial Revolution (4IR) era. This is mainly due to its capability for providing accurate predictions and reliable consistency in decision-making. Bridge engineering focused on structure monitoring and inspection is a crucial activity for disaster prevention. Therefore, it is an application field wherein synergies between professional knowledge and sophisticated machine-based analytics strategies can be established and even drive time-effective interventions. This paper presents a comparison of DL models used to detect defects in bridges, resorting to the following architectures: MobileNetV2, Xception, InceptionV3, NASNetMobile, Visual Geometry Group Network-16 (VGG16), and InceptionResNetV2. Different optimizers (e.g., Nadam, Adam, RMSprop, and SGD) crossed with distinct learning rates (e.g., 1, 10−1, 10−2, 10−3, 10−4, and 10−5) were employed. VGG16, Xception, and NASNetMobile showed the most stable learning curves. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) overlapping images clarifies that InceptionResNetV2 and InceptionV3 models seek features outside the areas of interest (defects). Comparing optimizers’ performance, the adaptive ones outperform SGD with decay schedulers for learning rates.