Cracks on concrete surfaces are vital factors affecting construction safety. Accurate and efficient crack detection can prevent safety-related accidents. Using drones to photograph cracks on a concrete surface and detect them through computer vision technology has the advantages of accurate target recognition, simple practical operation, and low cost. To solve this problem, an improved CenterNet concrete crack-detection model is proposed. Firstly, a channel-space attention mechanism is added to the original model to enhance the ability of the convolution neural network to pay attention to the image. Secondly, a feature selection module is introduced to scale the feature map in the downsampling stage to a uniform size and combine it in the channel dimension. In the upsampling stage, the feature selection module adaptively selects the combined features and fuses them with the output features of the upsampling. Finally, the target size loss is optimized from a Smooth L1 Loss to IoU Loss to lessen its inability to adapt to targets of different sizes. The experimental results show that the improved CenterNet model reduces the FPS by 123.7 Hz, increases the GPU memory by 62 MB, increases the FLOPs by 3.81 times per second, and increases the AP by 15.4% compared with the original model. The GPU memory occupancy remained stable during the training process and exhibited good real-time performance and robustness.